User-Defined Data Transformations
Step function changes in the workload or other known changes can often introduce problems into linear regression forecasts. The following table illustrates this type of problem by presenting monthly observations of the average number of jobs per hour submitted by an engineering group. In April, three more engineers were added to the group, resulting in a step function increase of approximately 10 jobs per hour.
micsrm140cd
Step function changes in the workload or other known changes can often introduce problems into linear regression forecasts. The following table illustrates this type of problem by presenting monthly observations of the average number of jobs per hour submitted by an engineering group. In April, three more engineers were added to the group, resulting in a step function increase of approximately 10 jobs per hour.
Observation Jobs Month Number Hour ======= =========== ====== JAN98 1 19 FEB98 2 25 MAR98 3 23 APR98 4 34 MAY98 5 37 JUN98 6 40
Figure 7-7 shows a scatter plot of the data. A linear regression model developed for this historical data series has the following parameters:
n = 6, the number of historical observations b = 14.5, the y intercept m = 4.3, the slope of the line 2 r = 0.92, the coefficient of determination F = 5.1, the F value p = .003, the probability that we should reject the hypothesis s = 2.7, the standard error e
The following table shows the predicted and residual values for the historical data series:
Observation Jobs Est. Residual Month Number Hour Jobs (Error) ======= =========== ====== ====== ======== JAN98 1 19 18.8 0.2 FEB98 2 25 23.2 1.8 MAR98 3 23 27.5 4.5 APR98 4 34 31.8 2.7 MAY98 5 37 36.2 -6.9 JUN98 6 40 40.5 -21.5
Although the r-squared of 0.92 for the model is very good, the results are suspicious due to the step function increase that occurred in the workload. You can modify the historical series to account for the step function increase of 10 jobs per hour in April. There are two ways to approach the problem:
- Subtracting 10 from the April through June observations.
- Adding 10 to the January through March observations.
You would probably want to modify the January through March observations as you would expect the future observations to be more like the April through June sales. Therefore, you would transform the historical observations using the following equation:
x(j) = x(j)+10, for j<=3 (Eqn 13) x(j) = x(j), for j>3
Univariate Model Forecasting allows you to specify SAS statements to accomplish these transformations. The following table shows the result of this transformation:
Observation Transformed Month Number Jobs ======= =========== ====== JAN98 1 29 FEB98 2 35 MAR98 3 33 APR98 4 34 MAY98 5 37 JUN98 6 40
Using the transformed observations, a second linear model was developed. The parameters of this model are shown below:
n = 6, the number of historical observations b = 28.5, the y intercept m = 1.77, the slope of the line 2 r = 0.79, the coefficient of determination F = 15.23, the F value p = 0.01, the probability that the hypothesis should be rejected s = 1.92, the standard error e
The following table shows the predicted and residual values developed using this model:
Est Obs Jobs Trans. Jobs Residual Month # Hour Jobs Hour (error) ======= === ====== ====== ===== ======== JAN98 1 19 29 30.2 -1.2 FEB98 2 25 35 32.0 3.0 MAR98 3 23 33 33.8 -0.8 APR98 4 34 34 35.6 -1.6 MAY98 5 37 37 37.3 -0.3 JUN98 6 40 40 39.1 0.9
Note that the r-squared and F values for the resulting model, while acceptable, are poorer than the values calculated for the original historical data series. However, the statistical model proposed is much more reasonable because it accounts for a significant step that occurred in the data. Always examine each model to determine whether the proposed model is reasonable for the data being modeled.
Figure 7-7. Engineering Jobs Per Hour
ENGINEERING JOBS PER HOUR | | 40.0 + * | | | 37.5 + | * | | J 35.0 + O | B | * S | 32.5 + / | | H | O 30.0 + U | R | | 27.5 + | | | 25.0 + * | | | * 22.5 + | | | 20.0 + | | * | 17.5 + | ---+------------------+------------------+------------------+------------------+------------------+-- 1 2 3 4 5 6 OBSERVATION NUMBER