Threshold Event Processing Self-Monitoring Metrics

To determine if you are doing too much eventing, monitor the key performance indicators in Data Aggregator. Eventing in Data Aggregator is performed in batches, for example, events are simultaneously evaluated and generated for large groups of items. Several metrics provide self-monitoring to assess the health of the Data Aggregator system.
capm370
To determine if you are doing too much eventing, monitor the key performance indicators in Data Aggregator. Eventing in Data Aggregator is performed in batches, for example, events are simultaneously evaluated and generated for large groups of items. Several metrics provide self-monitoring to assess the health of the Data Aggregator system.
To view these metrics, add a custom IM Device MultiTrend view to a dashboard. Edit the dashboard, using the following metrics from the metric family
Data Aggregator Event Calculation Times
:
  • Event Process Queue Size
    shows the size of the event processing queue. An increase in queue size without a subsequent recovery (trending downward) indicates that eventing is backed up.
  • Count of Cleared Events
    indicates the number of cleared events that are in the reporting resolution window.
  • Count of Created Events
    indicates the number of raised events that are in the reporting resolution window.
    A continuously large number of events that are raised or cleared can affect the Event Manager database. These metrics can indicate when your system has exceded the recommended event generation rate. Event generation/clear bursts are acceptable.
  • Count of Processed Event Rule Evaluations
    indicates the sum of event rules multiplied by the number of items those rules are applied to. The higher the number of evaluations, the more work your system is doing. Some evaluations are more expensive than others. For example, evaluations with more conditions, more standard deviation conditions, or longer duration and window are more expensive. The total acceptable number of evaluations depends on your event rules.
  • Total Time to Calculate Events
    indicates the total amount of time that was spent processing events for this metric family. If the value of this metric exceeds the number of seconds in the reporting resolution window, the eventing was delayed or backlogged at that point in time.
In general, steady values for these self-monitored metrics indicate a healthy system. Some intensive database jobs cause fluctuation in these self-monitoring metrics. Typically, these jobs run between 2 AM and 4 AM UTC. Turn on eventing slowly and judge the system health before moving forward with different rules. Monitor the health of the system over 24 hours after each subsequent change.
Errors in the Karaf log on the Data Aggregator system may also indicate that your system is under stress.