L1.09: Section 8
Section 8: Using alternative goodness-of-fit indicators
The standard deviation (or the sum of squared deviations on which it is based) is an example of what is called a “cost” function, a computed value that an adjustment or decision process tries to make as small as possible. While standard deviation is by far the most commonly used cost function in modeling processes, there are other possible ways to measure a model’s quality that can be more appropriate in some circumstances. Usually these different cost functions are designed to reflect differences in economic costs of different errors (such as when an overestimate has more expensive consequences than an underestimate), to limit the effect of big deviations on the fitting process (so that a single bad value will not change the model vary much), or to allow for cases where the importance of a deviation depends on the size of the deviation relative to the data value (which is often true for exponential models). If you use an alternative goodness-of-fit indicator to find a “best-fit” model, make an explicit statement about what indicator was minimized. This is needed both so that people can decide how to use the model, and so that they would be able to reproduce your results from the same data.Using the maximum deviation as an alternative goodness-of-fit indicator
Standard deviation is a good description of the size of the typical deviation between a model and its data. But in some situations you may be more concerned about the maximum difference, rather than about the size of the typical difference. This can happen when you want to be able to identify a deviation amount that people can confidently expect will not be exceeded (although in that case you will usually need to allow some extra margin based on the typical random noise). If you want to minimize the greatest difference between the population model and any individual population value, you would use the indicator “=MAX(E3:E12)” (placed in a spare cell such as H9) rather than “=SUM(E3:E12)”. If Solver is asked to minimize a cell with this formula (e.g., H9), it will adjust G3 and G4 (and any other parameters) in ways that may make the standard deviation higher, but will make the maximum distance above or below the data as small as is possible for the chosen kind of model.Example 11: Find the exponential model that minimizes the maximum deviation from the census data.
A | B | C | D | E | F | G | H | I | |
1 | x | y data | y model | Residual | Squared | Exponential model: y = a * (1+r)^x | |||
2 | Year-1780 | Population | Prediction | deviation | deviations | MODEL PARAMETERS | |||
3 | 0 | 2.8 | 3.420962 | -0.620962 | 0.385594 | 3.420962 | a: Initial value at x=0 | ||
4 | 10 | 3.9 | 4.501465 | -0.601465 | 0.361761 | 0.0278937 | r: Growth rate | ||
5 | 20 | 5.3 | 5.923243 | -0.623243 | 0.388432 | ||||
6 | 30 | 7.2 | 7.794087 | -0.594087 | 0.352939 | ||||
7 | 40 | 9.6 | 10.25583 | -0.655832 | 0.430116 | Goodness of fit for these settings | |||
8 | 50 | 12.9 | 13.49511 | -0.595114 | 0.354161 | Sum of sq dev | 3.592537 | ||
9 | 60 | 17.1 | 17.75752 | -0.657517 | 0.432328 | Max sq dev | 0.432328 | ||
10 | 70 | 23.2 | 23.36619 | -0.166189 | 0.027619 | ||||
11 | 80 | 31.4 | 30.74635 | 0.653651 | 0.427259 | ||||
12 | 90 | 39.8 | 40.45752 | -0.657517 | 0.432328 |
Relative standard deviation
A relative deviation is the data-minus-model difference divided by the value of the model. This ratio can be expressed as a decimal fraction or a percentage – a few percent would be typical of most situations. Relative deviations can be useful when the model will not have zero values (in which case the division would not work) and there is a natural zero to the data scale (so that the ratio has corresponds to some real relationship). If we wish to use relative deviations in a worksheet like Models.xls, we replace the formula “=B3-C3” in cell D3 (and similarly other column-D cells) by the formula “=(B3-C3)/C3”. Nothing else in the worksheet need be changed. Now Solver will minimize the sum of squared relative deviations, since that is now what is being computed in H8. The relative standard deviation can be computed from that sum in same way as before, with the formula “=SQRT(H8/(10-2))” in this case, which has 10 data points and 2 parameters.Example 12: Fit an exponential model to the population data, minimizing relative standard deviation; state the best-fit formula and the relative standard deviation.
Solution approach:- Make a copy of the Exponential Model worksheet in Models.xls.
- Set cell D3 to the formula “=(B3-C3)/C3” to make column D relative deviations
- Copy the data to the worksheet and find the best-fit exponential model using Solver.
- Set H12 (or some other cell) to “=SQRT(H8/(10-2))”, to compute standard deviation
- Put appropriate labels in cells G8 and G10 to explain the contents of H8 and H10.
- Report the resulting formula and relative standard deviation.
Answer: The best-fit exponential model for this data when minimizing relative standard deviation is [latex]y=2.91\cdot{{(1.0300)}^{x}}[/latex], whose relative standard deviation from the data is 0.0231, or 2.31%.
A | B | C | D | E | F | G | H | I | |
1 | x | y data | y model | Residual | Squared | Exponential model: y = a * (1+r)^x | |||
2 | Year-1780 | Population | Prediction | deviation | deviations | MODEL PARAMETERS | |||
3 | 0 | 2.8 | 2.907038 | -0.03682 | 0.001356 | 2.907038 | a: Initial value at x=0 | ||
4 | 10 | 3.9 | 3.906720 | -0.00172 | 2.96E-06 | 0.029997 | r: Growth rate | ||
5 | 20 | 5.3 | 5.250177 | 0.00949 | 9.01E-05 | ||||
6 | 30 | 7.2 | 7.055627 | 0.02046 | 0.000419 | ||||
7 | 40 | 9.6 | 9.481941 | 0.01245 | 0.000155 | Goodness of fit for these settings | |||
8 | 50 | 12.9 | 12.74263 | 0.01235 | 0.000153 | Sum of sq rel dev | 0.004275 | ||
9 | 60 | 17.1 | 17.12461 | -0.00144 | 2.06E-06 | ||||
10 | 70 | 23.2 | 23.01348 | 0.00811 | 6.57E-05 | Relative std dev | 0.023117 | ||
11 | 80 | 31.4 | 30.92743 | 0.01528 | 0.000233 | ||||
12 | 90 | 39.8 | 41.56287 | -0.04242 | 0.001799 |
Procedure for using Solver with a template from the Models.xls spreadsheet:
- Set up an appropriate worksheet:
- Make a copy of the template for the desired model, and put the data into columns A and B.
- Spread the formulas in C3, D3, and E3 down to the same number of rows as the data.
- Make a graph with a scatter plot that shows both the data and the model.
- Set the model parameters to a rough estimate, so that the two graphs are nearby.
- Select Solver from the Tools menu and fill out the dialog box it displays:
- Enter the name of the cell containing the standard deviation in the “Set Target Cell” section at the top of the Solver dialog box (this the bottom computed cell in column G).
- Choose the “Min” (for Minimum) option in the second line, since we want Solver to find the smallest possible value for the standard deviation.
- Designate the parameter cells (“G3,G4” for models with two parameters, “G3:G5” for models with three parameters, etc.) in the “By Changing Cells” section on the fourth line.
- Press the “Solve” button at top right.
- Check the graph to see if the model now is close to the data, with some on each side. If it is not, you have made a mistake (probably in step [ii]) and must repeat the process correctly. If the model goes through the data but is the wrong shape, you are probably using an inappropriate model.
- If the graph is acceptable, press “OK” to accept the solution that is found. The parameters (and thus the “Model” values in the column C) are now set to the best-fit values to a very high accuracy.
Licenses & Attributions
CC licensed content, Shared previously
- Mathematics for Modeling. Authored by: Mary Parker and Hunter Ellinger. License: CC BY: Attribution.