I1.09: Section 6
Section 6: Using models fitted with Models.xls to predict values
The main purpose of using data to make a model formula is that the formula can then be used to compute predictions of what the output y would be for any input x. This can be used to predict the future, to make inferences about the past (prior to the first data point), to find intermediate values between data points, and even to make a better estimate of what value you would get for the same measurement if you repeated it at one of the input values you already used.Extrapolation
The y = 1.8 x + 11 equation that fits the sediment-depth data well, for example, can be used to predict what sediment depth can be expected after day 80, the last day for which actual data was given. Using data to predict what measurements for a process will be outside the range of data input values is called extrapolation. (“extra” comes from a Latin word meaning “outside of”).
Example 7: What sediment depth does the y = 1.8 x + 11 model predict at 90 days after cleaning?
Answer: Evaluate the model formula at: [latex]\begin{align}&y=1.8x+11\\&y=1.8\cdot(90)+11\\&y=162+11\\&y=173\text{millimeters}\\\end{align}[/latex]
Example 8a: What sediment depth does the y = 1.8 x + 11 model predict at 37 days after cleaning?
Answer: Evaluate the model formula at: [latex]x=37[/latex]
[latex-display]\begin{align}&y=1.8x+11\\&y=1.8\cdot(37)+11\\&y=66.6+11\\&y=77.6\text{millimeters}\\\end{align}[/latex-display]
Example 8b: What sediment depth does the y = 1.8 x + 11 model predict at 56.73 days after cleaning?
Answer: Evaluate the model formula at: [latex]x=56.73[/latex]
[latex-display]\begin{align}&y=1.8x+11\\&y=1.8\cdot(56.73)+12.7\\&y=102.114+11\\&y=113.114\\&y\cong113.1\text{millimeters}\\\end{align}[/latex-display]
Note that the final value is rounded to a precision consistent with the precision of the data.
Backwards extrapolation
You could even compute the model’s answer for an input value that comes before any of your data. That does not make sense for the sediment data (since we are told that the tank was changed abruptly by cleaning just before this data was taken), but in other situations it is often possible to make good estimates of what conditions were before data was taken.
Example 9: An accumulated coating of rust on the siding of a building is measured on June 1 for 15 successive years, and the these thickness measurements are found to fit a linear model, where y is the thickness in millimeters and x is the number of years since the first of these measurements in 1987. Estimate what the thickness of the coating was on June 1, 1980.
Answer: Evaluate the model formula at x = –7, since that corresponds to the year 1980 in the formula.
[latex-display]\begin{align}&y=0.085x+1.52\\&y=0.08\cdot(-7)+1.52\\&y=-0.56+1.52\\&y=0.96\text{millimeters}\\\end{align}[/latex-display]
What if the same data points are measured again?
For the sediment-depth data, the prediction of the model for sediment depth at 40 days after cleaning is 83 mm, which is 4.4 mm less than the actual data value of 88.6 mm. Which of these values would be best to use if we wanted to predict the depth at 40 days after some subsequent cleaning? Deviations between data values and model predictions can come from two different kinds of sources:- Noise: The deviations may just be random variations in the process, in which case we will do better to use the model, since next time the deviation is just as likely to be in the other direction. In a sense, the model is more accurate than the data in this case. The noise-suppressing smoothing effect that a model provides is an important benefit of the modeling approach.
- Oversimplified models: The relationship between the input and output variables for the process may not quite be a straight line, in which case even the best linear model will have errors that overestimate the data in some input ranges and underestimate it in others. If the linear model is oversimplified in this way, we will do better to use previous measurements. Better yet, we should use a non-linear model that has sufficient flexibility to follow the data more closely.
Licenses & Attributions
CC licensed content, Shared previously
- Mathematics for Modeling. Authored by: Mary Parker and Hunter Ellinger. License: CC BY: Attribution.