Data Smoothing Using a Geometric Moving Average
One problem that analysts often encounter when attempting to forecast computer requirements is the high degree of variability that is shown by the observations in a historical data series. These variations are typically caused by random elements (for example, test jobs) that often exist in a system's workload. The following table shows the monthly observations of test jobs that are processed by a moderately sized MVS system, illustrates this problem.
micsrm140cd
One problem that analysts often encounter when attempting to forecast computer requirements is the high degree of variability that is shown by the observations in a historical data series. These variations are typically caused by random elements (for example, test jobs) that often exist in a system's workload. The following table shows the monthly observations of test jobs that are processed by a moderately sized MVS system, illustrates this problem.
Observation Test Month Number Jobs ======= =========== ====== JAN98 1 2900 FEB98 2 3070 MAR98 3 2950 APR98 4 3080 MAY98 5 3200 JUN98 6 3150
Figure 7-5 shows a scatter plot of the data. A linear regression model developed for this historical data series has the following parameters:
n = 6, the number of historical observations b = 2881, the y intercept m = 50.6, the slope of the line 2 r = 0.68, the coefficient of determination F = 8.47, the F value p = 0.04, the probability that we should reject the hypothesis s = 72.6, the standard error e
The predicted and residual values for the historical data series are shown in the following table:
Observation Test Est. Residual Month Number Jobs Jobs (error) ======= =========== ====== ====== ======== JAN98 1 2900 2932 -32 FEB98 2 3070 2982 88 MAR98 3 2950 3033 -88 APR98 4 3080 3084 -4 MAY98 5 3200 3134 66 JUN98 6 3150 3185 35
Although one could argue that the model produced is marginally acceptable, only 68% of the variability in the historical data is accounted for by the model. You can treat this type of apparent randomness in historical data by using data smoothing techniques. Although a wide variety of techniques are available, perhaps the simplest is the geometric moving average (BAR77). A geometric moving average (GMA) is attractive in that you only need to sacrifice one historical data observation to calculate the smoothed series. (Other techniques require you to sacrifice many more historical data observations. For example, a five-point moving average requires you to sacrifice the first five observations.) You can calculate a geometric moving average using the following equation.
x(j) = alpha * x(j) + beta * x(j-1), for all j>=2 (Eqn 10) where alpha + beta = 1.0
Thus, you can use the first and second observation to calculate a new value for the second observation, and so on . Another feature of the geometric moving average is that you can select the degree of smoothing. If you select a large value for alpha (that is, 0.5 <= alpha < 1.0), the smoothed series is less sensitive to variations between observations. Conversely, if you select a small value for alpha (that is, 0.0 < alpha < 0.5), the smoothed series is more responsive to variations in the historical data series.
For example, you can apply a geometric moving average to the historical data observations shown in the previous table. For this example, alpha equals 0.5. Hence, beta is equal to 0.5. The observation for January would be lost and the observation for February would be computed as
FEB98 = 0.5 * 2900 + 0.5 * 3070 = 2985
The value for March would be computed based on the smoothed observation for February and the actual value for March. This procedure would be continued for the remainder of the observations, resulting in the following table:
Observation GMA Test Month Number Jobs ======= =========== ====== JAN98 1 . FEB98 2 2985 MAR98 3 2968 APR98 4 3024 MAY98 5 3112 JUN98 6 3131
Using the smoothed observations, a second linear model was developed. The parameters for this model are shown below:
n = 5, the number of historical observations b = 2869, the y intercept m = 43.6, the slope of the line 2 r = 0.88, the coefficient of determination F = 23.36, the F value p = 0.02, the probability that we should reject the hypothesis s = 29.6, the standard error e
The following table shows the predicted and residual values developed using this model:
Obs GMA Test Test Est. Residual Month # Jobs Jobs Jobs (error) ======= === ====== ====== ====== ======== JAN98 1 . 2900 2914 . FEB98 2 2985 3070 2957 28 MAR98 3 2968 2950 3000 -32 APR98 4 3024 3080 3043 -19 MAR98 5 3112 3200 3087 25 JUN98 6 3131 3150 3131 0
The forecast developed using the smoothed data series is much better behaved than the first forecast that was based on the untreated historical data series. Data smoothing is a powerful technique that you can use to minimize the effects of apparently random variations in historical data.
Figure 7-5. Monthly Job Counts
JOB COUNTS | | | | | 3200 + * 3190 + 3180 + 3170 + 3160 + 3150 + * 3140 + 3130 + 3120 + 3110 + 3100 + 3090 + 3080 + * J 3070 + * O 3060 + B 3050 + S 3040 + 3030 + 3020 + 3010 + 3000 + 2990 + 2980 + 2970 + 2960 + 2950 + * 2940 + 2930 + 2920 + 2910 + 2900 + * | | | | ---+------------------+------------------+------------------+------------------+------------------+-- 1 2 3 4 5 6 OBSERVATION NUMBER