Monitor Application Stability Using Differential Analysis

Differential Analysis lets you monitor changes in the performance and the stability of your applications.
apmdevops106
Differential Analysis lets you monitor changes in the performance and the stability of your applications.
 
 
Differential Analysis
 
Differential Analysis
 automatically identifies important changes in the performance of your applications. Legacy baselining predicts what is normal. Differential Analysis tracks the stability of your applications. It looks for uncontrolled variance in the Average Response Time of your frontend, backend, and business transaction metrics. The results are not very different from how a seismometer displays earthquakes. Uncontrolled variance appears as dark areas in an otherwise stable white strip, letting you pick out periods of instability across minutes, hours, or days -- even for hundreds of applications.
The core of Differential Analysis was invented and put into practical use by Walter Shewhart to address the problem of quality control on buried telephone lines. Shewhart invented statistical control charts and the Western Electric Rules. Western Electric Rules are decision rules that detect out-of-control variance on control charts.
 By default, Differential Analysis works on Average Response Time for application frontends, backends, and business transaction metrics. Differential Analysis also works for MOM-calculated metrics, for example, aggregated frontend and business service metrics. While you can configure other metrics, good results are not guaranteed. Feel free to experiment and get back to us with your results. Good results are not likely when measuring Responses Per Interval, GC Heap, or similar counter type metrics.
Differential Analysis Map
The Differential Analysis map is a visual exploration of the stability and responsiveness of many applications: dozens, hundreds, or even thousands of them. Each strip in the map corresponds to a single metric, for example, the Average Response Time for a specific frontend application. In a quiet, stable period the strip is a light shade. When instability occurs, the shade of the strip darkens progressively according to the severity of the instability. Thus, a single strip lets you see the stability of a single application or business transaction over time. The map sorts the most unstable strips to the top and makes it easy to scan through dozens of applications for unstable response times.
Variance intensity determines the color of the strip at a point in time. It is represented as follows:
  •  
    Gray 
    -- total inactivity, the application received no transactions over the interval measured.
     When data suddenly becomes unavailable, the shade of the last cell with data is retained until the number of empty cells is equal to the Window Length value. This value (20 by default) is configured in the differential control.
  •  
    White 
    -- good stability
  •  
    Light 
    -- moderate instability
  •  
    Dark 
    -- severe instability, a problem that requires immediate attention may exist.
Differential Analysis also reports peak instability over an interval of lower instability (a lighter shade). In this case, the strip shows the shade of the overall stability as usual. When the maximum variance intensity is greater than the average variance intensity for a period, a vertical line appears in the cell. The line is the shade of the maximum state. This depiction helps to ensure that you do not miss an important period of instability while looking at a longer time range. For example, during a long interval, any instability at shorter intervals can be averaged out. The map cell is striped with a vertical line indicating the peak variance, so that an important period of instability is not missed.
Map Special Cases
Be aware of the following special cases:
  • During the learning period of baseline calculation, the baseline engine does not report variance intensity. This behavior reduces false positives. The map may either not show a strip or show a yellow learning pattern to indicate that the differential control is not ready to report its results.
  • If a metric is idle for an interval, the number of transactions that are monitored is zero -- the map displays a gray, hashed pattern. This pattern indicates the absence of activity during that interval.
  • Complex queries can return partial results. If a query is too complex, Differential Analysis applies an intelligent clamp. A truncated set of results appears with a red message at the top of the map. This message indicates that:
    – The map does not display the full result set.
    – Traversing further down the Investigator tree limits the number of matching metrics and increases the chances of retrieving a full result set.
    APM includes a data point limit clamp that limits all query results beyond a specified number:
     
    introscope.enterprisemanager.query.datapointlimit
     
     
     
     
    Clamps the maximum number of data points that the Enterprise Manager retrieves from the SmartStor disk during a batch of metric queries.
     
    introscope.enterprisemanager.query.returneddatapointlimit
     
    Clamps the number of data points that are returned from a batch of metric queries.
If you use the data point limit clamps, you receive incomplete results in the Differential Analysis maps when the data point limit clamps are reached. Use the Differential Analysis intelligent clamping, and turn on data point limit clamps only when necessary.
Monitor Performance Using Differential Analysis
Real-time performance data displays in the Differential Analysis map. This map shows a baseline summary of the actual values, the prediction, and the standard deviation. You can also investigate performance for individual components.
 A standard deviation is the statistical difference between each data point and the predicted value.
 
Follow these steps:
 
  1. In APM Team Center, click 
    WebView
    .
    APM WebView appears.
  2. Click 
    INVESTIGATOR
    .
    The Metric Browser tree shows a hierarchical view of your system.
  3. In the tree, select the agent for which you want performance information, for example:
    SuperDomain | Host | Process |Agent | Frontends | Apps | Adaptor
     The Variance node and subnodes do not show Differential Analysis.
  4. Click the 
    Differential Analysis
     tab.
    The map shows a graphical representation of performance data and shows the last 8 minutes of data. The top 100 problematic metrics appear in decreasing instability. The data refreshes when you query, change the time period, or select a different node.
  5. (Optional) Adjust the time range using the Change Start Time arrow controls.
    The latest data appears.
  6. Click a strip of interest in the map.
    The Differential Analysis chart appears. This chart helps you understand the metric stability over the timeline that the strip represents. The chart shows the status of the monitored component, so you can quickly detect normal and abnormal performance:
    • A line represents an actual metric value.
    • The shaded regions correspond to standard deviation bands 1, 2, and 3. The darker the band, the higher the deviation from the predicted value. Any metric in the white is better than predicted. Any metric that appears above the bottom white area has exceeded the predicted value. For example, if the metric exceeds the top band, it is exceeding 3x the standard deviation.
  7. Mouseover the line.
    Tooltips display the metric values.
Investigate Differential Analysis Transactions
Use the Transaction Trace Viewer to find transaction traces that Differential Analysis triggered. In the Business Transactions tab, you can view transaction trace details and then decide if you must adjust data point limit clamps to improve performance.
 
Follow these steps:
 
  1. In APM Team Center, click 
    Experience View
    .
    The aggregated data for groups of related business transactions that are available in your universe appear as experience cards.
  2. Identify a card of interest and click the 
    Notebook
     icon on the card.
    The Analysis Notebook shows details about the experience.
  3. Notice the AVERAGE RESPONSE TIME sparkline. The sparkline shows the general shape of the variation over time of the BlamePoint metrics for any component. The sparkline shows information for the active time period that is selected in the timeline. Mouse over any point on the sparkline to see a numeric value. This sparkline helps you understand the metric stability over the timeline and quickly detect normal and abnormal performance.
    The Relationship Flow shows the transaction paths of the selected experiences. This map gives context for the event that occurred.
  4. Analyze the sparkline and the map. Identify a component that indicates performance issues. This component might be the source of performance degradation in your application environment.
  5. Click the component of interest in the map. An orange outline indicates a trace with uncontrolled variance.
    The Business Transactions tab appears.
  6. Click the 
    Business Transactions
     tab.
    A summary list shows traces that correspond to the component for the range that is selected in the timeline. Orange  Differential Analysis light orange  indicates that Differential Analysis triggered an alert--a transaction has uncontrolled variance.
  7. Select 
    Other
     from the 
    Trace Type
     drop-down list to categorize traces by characteristics other than the traces with errors or stalls, such as traces with uncontrolled variance. 
  8. Examine individual components and trace data. Look for a problematic trace and determine if you must adjust the data point limit clamps to improve stability.