Alarm Analytics

doi13
HID_Alarm_Analytics
 
Digital Operational Intelligence
 is a machine learning–driven, advanced analytics solution designed to help IT operations teams deliver a phenomenal user experience, improve service quality and drive operational efficiencies. This video shows how the alarm analytics feature of 
Digital Operational Intelligence
, reduces false and redundant alerts and in-turn reduces the mean time to resolution. Also, it also highlights the available filters and timeline which helps to isolate the root cause and analyze defect trends to manage anomalies before problems occur.
 

 
You can configure the following Normalized alarm or Anomaly alarm containers :
 
 
By using Alarm Analytics in CA Digital Operational Intelligence, you gain the following benefits:
  • Reduce alarm noise from multiple products
  • Correlate alarms across products to identify the root cause
  • View probability bands to determine buildup to an alarm
  • Fine-tuning alarm threshold by analyzing the historical pattern
 
 
Access Alarm Analytics
Alarm Analytics is a capability that provides overview and insights into service and derived alarms. You can view the following information in the Alarm Analytics page:
  • Service, raw, and anomaly alarms 
  • Alarm situations
  • Alarms by device type and severity
  • Variance in alarms for a period across devices, groups, and services
  • Top five devices, groups, and services generating the most number of alarms
 
Follow these steps:
 
  1. Log in to CA Digital Operational Intelligence. 
  2. After you log in, you can access Alarm Analytics in the following ways:
    •  
      View Alarms in Context of a Service
      1. From the
         Service Analytics Overview
        , select a service.
        The 
        Service Summary
         page appears. 
      2. Navigate to the
         Alarms Overview
         section, where you can view the service, anomaly, and raw alarms for the selected service. 
      3. Click an alarm type.
        You can now view the alarms for the alarm type for the selected service.
    •  
      View Service Alarms
      1. Click the Alarms icon  alarm icon.PNG  in the Navigation Panel.
        The 
        Alarms
         page appears. By default, the 
        Alarms
         page lists service alarms that have been generated in the last one week.
Alarm Categories
Alarms can be classified into the following categories:
  •  
    Anomaly Alarms:
     An anomaly alarm gets generated when a metric value deviation is detected by the Data Science engine using machine learning algorithms.
     
    Anomaly alarms can be generated for the following products:
    • CA Unified Infrastructure Management (CA UIM)
    • CA Performance Management (CA PM)
    • CA Network Flow Analytics (CA NFA)
    • CA Application Delivery Analysis (CA ADA)
    • CA Application Performance Management (CA APM)
  •  
    Service Alarms: 
    A service alarm is a group of alarms that affect one or more business services and are related to an incident, which is identified by the time it occurred and its root cause. The root cause is the alarm on the topologically deepest device in the affected business service. All situations reported by alarms in the group are due to the identified root cause.
  •  
    Raw Alarms: 
    Alarms that are generated from source products such as CA Unified Infrastructure Management, CA Spectrum, CA ADA, and CA APM or any custom data source.
  •  
    Situations Alarms:
     Alarms are grouped based on context using machine learning algorithms. Clustering clubs alarms together based on distinct dimensions and groups them together for triage or further analysis. Thus, clustering enables users to filter through a huge number of alarms and analyze alarms that are contextually relevant.
  •  
    Prediction Alarms: 
    Prediction alarms is a capability that harnesses the power of machine learning to discover patterns and trends. Based on these trends, the application predicts events that are likely to happen in the future. 
Overview of the Alarms Page
The 
Alarms
 page displays the service, anomaly, situation, and raw alarms that are generated in the defined period. By default, the alarms that are generated for one week are displayed.
  Overview of Alarms Page.png  
Filter
You can filter alarms by using the various options available in the Overview of the Alarms Page
  •  
    Global Search Filter
     
  •  
    Time Filter
     
  •  
    Alarm View Filter
     
  •  
    Filter by Alarm Attribute
     
Global Search Filter
The 
Global Search Filter
  Search Filter.PNG   lets you search for alarms in the alarms table. Enter your search text to view alarms that match your search text.
  • For Service alarms, the search takes effect in these columns:
    •  
      Alarm Type
       
    •  
      Alarm Message
       
    •  
      Service
       
  • For All alarms, the search takes effect in these columns:
    •  
      Severity
       
    •  
      Alarm Type
       
    •  
      Alarm Message
       
    •  
      Device
       
    •  
      Service
       
    •  
      Group
       
    •  
      Ticket ID
       
    •  
      Source Product
       
The search is case-sensitive for few columns.
Time Filter
The 
Calendar
 icon  Time Filter.PNG  enables you to select the duration for which you want to view the alarms that have been triggered. By default, service alarms that have been generated in the last one week are displayed.
Alarm View Filter 
The following video demonstrates how you can view insights and take necessary actions both manually and automatically:
 

 
Use the 
Alarms View Filter
  Alarm view filter.PNG to filter alarms based on the following category that are generated in the selected period:
  •  
    Situations
     
  •  
    Service Alarms
     
  •  
    All Alarms
     
 
Refresh Alarm Interval
 
To refresh the Alarms table automatically, enable the Auto-update view switch. Use this switch button to shift the view from auto update to manual update. 
  • If you want to manually update the alarm table:
    1.  Disable 
      Auto-update view
        switch.
    2.  A pop-up (
      Alarm view is out-of-date
      ) appears indicating "
      Updates are pending for the current alarm view. Refreshing will update the view, but focus may be lost. Applied filters and sorting will remain in effect after refresh
      ."
      alarm view is out of date.png  
    3. Click 
      Refresh View
       button to refresh the alarm table and the alarm table refreshes the data between the time interval that is set in the ALARM_REFRESH_INTERVAL environment variable, and displays the last 7 days data or Select close (X), to to leave the view as it is.
  • If you want to enable auto update, enable 
    Auto-update view
     switch and configure the refresh interval using the ALARM_REFRESH_INTERVAL environment variable in the adminui container. By default, the interval is set to 5 minutes. You can configure when the table data must be refreshed by entering the interval, in minutes. If the application is idle for more than the configured interval, alarm refresh is disabled to maintain the auto session timeout functionality. In such situations, you must manually perform some action the Alarm page to enable the alarm refresh.
Situations Alarms
These Alarms are grouped based on context using machine learning algorithms. Clustering clubs alarms together based on distinct dimensions and groups them together for triage or further analysis. Thus, clustering enables users to filter through a huge number of alarms and analyze alarms that are contextually relevant. 
 
View Situations Alarms
 
From the 
Alarms View filter
, click 
Situations
 to view the situations alarms that are created in the selected period.
Click the alarm group to view the individual alarms that are grouped. 
  View Situation Alarms.png  
Click an alarm in the cluster and the row expands to display more information about the alarm.
When you drill down on a situation alarm, you can view unique alerts that are clustered together. The clustering of unique alerts together is known as nested clusters. The learning algorithms create sub-clusters that are more contextually relevant and correlated. 
 If sub-clusters contain only one alarm, the situation alarm directly displays the raw alarm details.
For more information on various columns in the Alarms table for Situations alarm, see Alarm Table section.
Service Alarms
A service alarm is a group of alarms that affect one or more business services and are related to an incident, which is identified by the time it occurred and its root cause. The root cause is the alarm on the topologically deepest device in the affected business service. All situations reported by alarms in the group are due to the identified root cause.
For more information on various columns in the Alarms table for Service alarm, see Alarm Table section.
 
View Service Alarms
 
From the 
Alarms View filter
, click 
Service Alarms
 to view the service alarms that are created in the selected period.
 
Click a service alarm to view a list of corresponding suppressed alarms from devices or Configuration Item alarms. The alarm on the deepest Configuration Item or device (as determined by Service Analytics) is the root cause alarm and is indicated by the  Root_Cause.png  icon.
All Alarm
These alarms are generated from source products such as CA Unified Infrastructure Management, CA Spectrum, CA ADA, and CA APM or any custom data source.
 
View Alarm Insights
 
In Alarm Insights, you can view detailed analytics about all alarms. you can view insights for all alarms based on a given time frame.
For more information on various columns in the Alarms table for all alarm, see Alarm Table section.
  View Alarm Insights.png  
You can view the following details:
  •  
    Distribution: 
    Displays a pie chart with distribution by device type and severity.
    distribution.PNG
  •  
    Variance: 
    Displays the variance (change in percentage over a period) in alarms. You can filter 
    Variance
     by the following options:
    •  
      By Devices
       
    •  
      By Groups
       
    •  
      By Services
       
    Variance.PNG  
  •  
    Top Alarming
    : Displays the top five devices, groups, or services that generate the most number of alarms. The colored horizontal bar displays the count of alarms while the underlying shaded horizontal bar shows the historical average for the device, group, or service.  You can filter 
    Top Alarming
     by the following options:
    •  
      Devices
       
    •  
      Groups
       
    •  
      Services
       
    Top alarming.PNG  
: Click the bar graph or Pie chart in the Insight section to filter alarms in-context of device types, severity, or device names. The selected filter is applied to the Alarms table. To remove the filter that is applied, click the clear button next to that filter. To remove all filters, click 
Clear All
.
  graph filter.PNG  
Filter by Alarm Attributes
You can also filter alarms by attributes using the Alarm Attributes Filter  Alarm Attribute filter.PNG . This filter allows you to view only those alarms with attributes matching your search criteria.
Create Alert Filters
 
Follow these steps:
 
  1. Click the plus icon  Plus.PNG  in the 
    Alarm Attributes Filter
    .
  2. Select a filter attribute with any one or more operators.
  3. Click 
    Add
     to add attributes to the alarm filter. For example, if you want to see all critical alarms, select the attribute as 
    Severity
     and select its value as 
    Critical
    .
    The Alarms table and Insights show only the alarms that match your search criteria for the selected attributes.
Save Alert Filters
You can also perform the following actions:
  1. To save the current alarm filter, click the 
    Save current filter 
    button  save current filter.PNG .
    The Save Filter window appears.
  2. Enter the name for the alarm filter in  
    Alarm filter name
    .  
  3. If you want to  set the alarm filter as default, select the 
    Set as default 
    option. When you access the page next time, the 
    Set as default
     option is enabled and the default alarm filter is applied, and the Alarms table and Insights show only alarms with the saved attributes. By default, the 
    Set as default
     option is disabled.
    :The default alarm filters that are created in the context of a service takes precedence over the default alarm filters in the Alarm Analytics page.
Edit Alert Filters
  • Add extra attributes to an existing alarm filter and click the 
    Update 
    button to update the existing alarm filter.
  • To save the alarm filter with a different name, specify a different name and click 
    Save as
View Alert Filters
To view all saved alarm filters, click the 
Saved filters 
button  saved filters.PNG
Alarms Table
The Alarms page displays a table of
 Service
Situation
, and 
All alarms
 with other details according to your selection criteria.
The following table describes the various columns in the Alarms table:
 
Column Name
 
 
Description
 
 
Column / Row-level Alarm Action
 
 Enables you to perform row level alarm actions or column level to perform bulk alarm action.
 
Severity
 
Indicates the severity of an alarm. The following colors indicate the severity:
  • Red: Critical
  • Orange: Major
  • Yellow: Minor
  • Light Blue: Informational
  • Teal: Warning
  • Green: Any other alarm that does not fall in the above categories
 
Alarm Type
 
Displays the alarm type.
 
Alarm Message
 
Displays the description for an alarm.
 
Device 
 
Displays the device or host name for which the alarm is generated. If host name is not available, the IP address is shown.
 
Service
 
Displays the service which is impacted by an alarm.
 
Group
 
Displays information about the group or groups to which the device belongs.
 
Last Update
 
Displays the time and date when the alarm was last updated.
 
Ack'd/assigned
 
Displays the person name to whom the alarm is assigned and whether the person has acknowledged it. The alarms that have been acknowledged are indicated with a tick mark in this column.
 
Ticket ID
 
Displays the ID generated by ticketing system. 
 
Source Product
 
Displays the product from which the alarm is generated.
 For a service alarm, Device column and Group column remains empty.
Click the column header to sort the Alarms table in ascending or descending order. However, you cannot sort the following columns:
  •  
     Alarm Type
     
  •  
    Service
     
  •  
    Ticket ID
     
  •  
    Group
     
Click any row in the Alarms table. For service alarms, click a service alarm to view a list of causing alarms. Click a causing alarm and the row expands to display these tabs as shown in the following image:
  •  
    Overview
    This Overview tab provides additional information about the selected alarm. Properties are specific to the product or source from where the alarm originates.
    This tab also lets you view the Log Analytics dashboard of the logs that are associated with the alarm. Click the link to launch the Log Analytics (Kibana) dashboard.
    Alarm Table - Overview.PNG  
  •  
    Affected Metric
     
    The Affected Metric tab shows the metric chart of the underlying metric. If the required fields are not available in the alarm, this tab is not shown.
    Probability bands are shown if the metric is configured with the proprietary Data Science Engine from CA. If the metric is not configured with the Data Science Engine, the actual metric chart with original metric values appears. If the metric chart is not available, you must verify if the particular metric is ingested.
    The metric chart displays anomaly alarms when a threshold is crossed. The threshold is determined based on historical trends. The alarm severity is indicated as minor, major, or severe. The Affected Metric tab has a 
    Correlated Metrics
     link that launches Performance Analytics from the context of an alarm and allows you to compare a single metric from different devices or multiple metrics from a single or multiple devices. For more information about the metric charts, see
     
     Metric Charts Views.
    By default, the chart time range is 8 hours before the last alarm update to one hour after the last alarm update.
      affected metrics.PNG  
  •  
    Impacted Services
    This tab provides details of the services that are impacted due to the selected alarm. Clicking on a service redirects to Service analytics details page of that particular service.The table displays the impacted service metrics (such as users that are availing the service, actual service availability, and risk).
      Impacted service.PNG  
Alarm Actions
The alarm actions let you perform a specific action on an alarm. These actions are categorized as follows on Alarm Analytics page:
  • Alarm Management
  • Ticket Management
  • Email Notification
 
Prerequisite:
 
The following table describes the supported alarm actions for different alarm types and source products:
Source Product
Alarm Actions
CA Spectrum
acknowledge, unacknowledge, ticket, assignment, unassignment, clear
CA UIM
ticket, assignment, unassignment, clear
CA APM
acknowledge, unacknowledge, ticket, assignment, unassignment
CA ADA
acknowledge, unacknowledge, ticket, assignment, unassignment
Alarm Type
Alarm Actions
anomaly alarm
acknowledge, unacknowledge, ticket, assignment, unassignment
prediction alarm
acknowledge, unacknowledge, ticket, assignment, unassignment
custom alarm
acknowledge, unacknowledge, ticket, assignment, unassignment
  • If the Southbound Gateway to Spectrum and UIM is configured then only action updates will be sent to the source products. Otherwise, alarm actions will be with in CA Digital Operational Intelligence.
  • For anomaly, prediction and custom alarms, the alarm action updates will be with in 
    Digital Operational Intelligence
    .
  • For service alarms, the alarm actions are performed based on the root cause alarm source.
  • No alarm actions are supported for the Situation alarm.
Alarm Management
You can use the bell icon  bell icon.PNG  to manage alarms, acknowledge assigned alarms and clear the assigned alarm.
 
Follow these steps:
 
  1. In the Alarm Analytics page, select alarms from the Alarms table. Click the (bell) icon. Alternatively, click the link in the Ack'd/assigned column for an alarm.
    The Alarm Management dialog appears. In this dialog, you can perform the following actions:
    • Click the 
      Assign to
       option and select the user to whom the alarm is to be assigned.
    • Click the 
      Acknowledge
       option to acknowledge the selected alarm.
    • Click the 
      Clear
       option to clear alarms.
    • Click the 
      Un-Acknowledge
       option to delete acknowledgment for an alarm.
    • Click the 
      Un-assign 
      option to remove the assignment for an alarm.
  Alarm Management on prem.png  
Ticket Management
You can manage tickets in ServiceNow directly from the Alarm Analytics page. You must configure the ServiceNow notification channel to manage tickets update.
  Ticket Management On prem.png  
 
Follow these steps
:
  1. Select alarms from the table and click the ticket management  Ticket Management icon.png  icon. 
  2. Select 
    Open ticket
     to open a ServiceNow ticket corresponding to the alarm. 
  3. Alternatively, click the 
    Open Ticket
     link in the 
    Ticket ID
     column for an alarm.
    : The Open Ticket link is visible only when the Configure ServiceNow Notifications is configured. 
    You can redirect to 
    Digital Operational Intelligence
     user interface by using the link provided in ServiceNow ticket.
  4. A ticket is created for the selected alarms in ServiceNow.
Email Notification
You can notify users about an alarm directly from the Alarm Analytics page. You must configure the SMTP server to send emails to the recipient.
Click the email  Email Notification.png  icon. Select one or more distribution lists to notify them about the alarm through email.
 If you do not configure the SMTP server, a success message appears but the email is not sent to the recipient.
Email Notification on prem.png