Data Smoothing Using a Geometric Moving Average

Predicted and residual values developed using a Geometric Moving Average.
micsrm140cd
One problem that analysts often encounter when attempting to forecast computer requirements is the high degree of variability that is shown by the observations in a historical data series. These variations are typically caused by random elements (for example, test jobs) that often exist in a system's workload. The following table shows the monthly observations of test jobs that are processed by a moderately sized MVS system, illustrates this problem.
                       Observation      Test              Month       Number         Jobs             =======    ===========     ======              JAN98          1           2900              FEB98          2           3070              MAR98          3           2950              APR98          4           3080              MAY98          5           3200              JUN98          6           3150
Figure 7-5 shows a scatter plot of the data. A linear regression model developed for this historical data series has the following parameters:
    n  =      6, the number of historical observations       b  =   2881, the y intercept       m  =   50.6, the slope of the line        2     r  =   0.68, the coefficient of determination       F  =   8.47, the F value       p  =   0.04, the probability that we should reject the                  hypothesis       s  =   72.6, the standard error      e
The predicted and residual values for the historical data series are shown in the following table:
             Observation      Test       Est.      Residual    Month       Number         Jobs       Jobs      (error)   =======    ===========     ======     ======     ========    JAN98          1           2900       2932         -32    FEB98          2           3070       2982          88    MAR98          3           2950       3033         -88    APR98          4           3080       3084          -4    MAY98          5           3200       3134          66    JUN98          6           3150       3185          35
Although one could argue that the model produced is marginally acceptable, only 68% of the variability in the historical data is accounted for by the model. You can treat this type of apparent randomness in historical data by using data smoothing techniques. Although a wide variety of techniques are available, perhaps the simplest is the geometric moving average (BAR77). A geometric moving average (GMA) is attractive in that you only need to sacrifice one historical data observation to calculate the smoothed series. (Other techniques require you to sacrifice many more historical data observations. For example, a five-point moving average requires you to sacrifice the first five observations.) You can calculate a geometric moving average using the following equation.
    x(j) = alpha * x(j) + beta * x(j-1),            for all j>=2 (Eqn 10)              where alpha + beta = 1.0
Thus, you can use the first and second observation to calculate a new value for the second observation, and so on . Another feature of the geometric moving average is that you can select the degree of smoothing. If you select a large value for alpha (that is, 0.5 <= alpha < 1.0), the smoothed series is less sensitive to variations between observations. Conversely, if you select a small value for alpha (that is, 0.0 < alpha < 0.5), the smoothed series is more responsive to variations in the historical data series.
For example, you can apply a geometric moving average to the historical data observations shown in the previous table. For this example, alpha equals 0.5. Hence, beta is equal to 0.5. The observation for January would be lost and the observation for February would be computed as
    FEB98 = 0.5 * 2900 + 0.5 * 3070 = 2985
The value for March would be computed based on the smoothed observation for February and the actual value for March. This procedure would be continued for the remainder of the observations, resulting in the following table:
                       Observation    GMA Test              Month       Number         Jobs             =======    ===========     ======              JAN98          1             .              FEB98          2           2985              MAR98          3           2968              APR98          4           3024              MAY98          5           3112              JUN98          6           3131
Using the smoothed observations, a second linear model was developed. The parameters for this model are shown below:
    n  =      5, the number of historical observations       b  =   2869, the y intercept       m  =   43.6, the slope of the line        2     r  =   0.88, the coefficient of determination       F  =  23.36, the F value       p  =   0.02, the probability that we should reject the                  hypothesis       s  =   29.6, the standard error      e
The following table shows the predicted and residual values developed using this model:
             Obs   GMA Test    Test      Est.     Residual    Month      #      Jobs      Jobs      Jobs     (error)   =======    ===    ======    ======    ======    ========    JAN98      1        .       2900      2914         .    FEB98      2      2985      3070      2957         28    MAR98      3      2968      2950      3000        -32    APR98      4      3024      3080      3043        -19    MAR98      5      3112      3200      3087         25    JUN98      6      3131      3150      3131          0
The forecast developed using the smoothed data series is much better behaved than the first forecast that was based on the untreated historical data series. Data smoothing is a powerful technique that you can use to minimize the effects of apparently random variations in historical data.
Figure 7-5. Monthly Job Counts
                                                   JOB COUNTS                |            |            |            |            |       3200 +                                                                              *       3190 +       3180 +       3170 +       3160 +       3150 +                                                                                                 *       3140 +       3130 +       3120 +       3110 +       3100 +       3090 +       3080 +                                                           * J     3070 +                     * O     3060 + B     3050 + S     3040 +       3030 +       3020 +       3010 +       3000 +       2990 +       2980 +       2970 +       2960 +       2950 +                                        *       2940 +       2930 +       2920 +       2910 +       2900 +  *            |            |            |            |            ---+------------------+------------------+------------------+------------------+------------------+--               1                  2                  3                  4                  5                  6                                                    OBSERVATION NUMBER