Threshold Event Processing Self-Monitoring Metrics
To determine if you are doing too much eventing, monitor the key performance indicators in Data Aggregator. Eventing in Data Aggregator is performed in batches, for example, events are simultaneously evaluated and generated for large groups of items. Several metrics provide self-monitoring to assess the health of the data aggregator.
To determine if you are doing too much eventing, monitor the key performance indicators in the data aggregator. Eventing in the data aggregator is performed in batches, for example, events are simultaneously evaluated and generated for large groups of items. Several metrics provide self-monitoring to assess the health of the data aggregator.
To view these metrics, add a custom
IM Device MultiTrendview to a dashboard. Edit the dashboard, using the following metrics from the
Data Aggregator Event Calculation Timesmetric family:
- Event Process Queue SizeThis metric shows the size of the event processing queue. An increase in queue size without a subsequent recovery (trending downward) indicates that eventing is backed up.
- Count of Cleared EventsThis metric indicates the number of cleared events that are in the reporting resolution window.
- Count of Created EventsThis metric indicates the number of raised events that are in the reporting resolution window.A continuously large number of events that are raised or cleared can affect the Event Manager database. These metrics can indicate when your system has exceeded the recommended event generation rate. Event generation/clear bursts are acceptable.
- Count of Processed Event Rule EvaluationsThis metric indicates the sum of event rules multiplied by the number of items to which those rules are applied. The higher the number of evaluations, the more work your system is doing. Some evaluations are more expensive than others. For example, evaluations with more conditions, more standard deviation conditions, or longer duration and window are more expensive. The total acceptable number of evaluations depends on your event rules.
- Total Time to Calculate EventsThis metric indicates the total amount of time that was spent processing events for this metric family. If the value of this metric exceeds the number of seconds in the reporting resolution window, the eventing was delayed or backlogged at that point in time.
In general, steady values for these self-monitored metrics indicate a healthy system. Some intensive database jobs cause fluctuation in these self-monitoring metrics. Typically, these jobs run between 2 AM and 4 AM UTC. Turn on eventing slowly and judge the system health before moving forward with different rules. Monitor the health of the system over 24 hours after each subsequent change.
Errors in the Karaf log on the data aggregator can also indicate that your system is under stress.
For more information about the threshold best practices, see Configure Threshold Profiles.