Alarm Policy Troubleshooting

This article includes troubleshooting topics related to alarm policies.
uim902
This article includes troubleshooting topics related to alarm policies.
Collected Metrics Not Showing During Alarm Policy Creation
This issue indicates that the metric type definitions are not present in the database.
To troubleshoot this issue:
  1. Query the
    s_qos_data
    table with the probe name to verify that the metrics are populated in the table:
    Select * from s_qos_data where probe='<probename>'
  2. Query the
    cm_configuration_item_metric_definition
    table to ensure that the given metric type exists in it:
    Select * from cm_configuration_item_metric_definition where met_type='<met_type>'
  3. If the above step does not work, get the required definition pack and update ci_definition_pack in the environment:
    Select * from cm_configuration_item_metric where ci_metric_id in (select ci_metric_id from s_qos_data where probe='<probename>')
Creation/Update of Alarm Policies Failing
This issue indicates that the associated alarm policy management webapp has stopped responding or has some error communicating with the database.
To troubleshoot the issue:
  • Verify the
    policy_management.log
    file in the 
    <Drive>\Nimsoft\probes\service\wasp
     folder for any error. If the error is because of the database connection issues, restart wasp.
Alarm Policy Not Deploying on Devices
This issue indicates that the robot version is not supported or the
policy_mode_enabled
parameter is set to
false
.
To troubleshoot this issue:
  • Verify that the robot on which you are creating the alarm policy is 7.96 or later.
  • Verify that the
    policy_mode_enabled
    parameter is set to true in the MCS configuration file (
    mon_config_service.cfg
    ) available in the
    <Drive>\Nimsoft\probes\service\mon_config_service
    folder.
Policy Details Present in plugin_metric.cfg But No Alarms Are Generating
This issue indicates that the collection interval has not reached yet or dynamic alarms are configured with the baseline enabled.
To troubleshoot this issue:
  • View the dashboard to verify whether any metric has been collected after defining the alarm policy and it has breached the threshold.
  • If dynamic alarms are configured, the baseline calculations are done every hour. So these alarms take time to be generated based on the creation time. For more information, see the KB Article.
  • Verify the spooler logs with the log level 4\5 on the robot computer where you have created the alarm policy.
  • Increase the log level of spooler to 4 or 5. 
  • Restart spooler.
  • Verify spooler.log to review the threshold calculation and reasons for alarms not getting generated.
Understanding Messages in the ssrv2audittrail Table
Review the
objectvalue
column in the table to understand and troubleshoot the messages:
  • Nametoip
    This implies that the robot is not reachable. Verify whether there are any issues with the robot in the controller.log file.
  • Session error
    This implies issues with a client session on the robot computer.
  • Error deploying profile
    This implies that the alarm policy deployment has failed. Verify the robot availability for the alarm policy.
  • Entity not found
    This implies that the required device or group is missing.
  • runtime_error\unknown_error
    This implies that the alarm policy creation has failed because of some unknown exception. Review the MCS logs for more details.
Device Added to Dynamic Group But Group Alarm Policy Not Getting Enforced
This issue can happen when the link between a device and a group is not proper or when the MCS fails to process the device.
To troubleshoot this issue:
  1. Collect the group ID from the
    cm_group
    table and
    ssrv2devicegroup
    table.
  2. Ensure that the device is a member of the group (
    cm_group_member
    table).
  3. Query the
    ssrv2policytargetstatus
    table with policyID, groupID, and ssrv2devicegroup (device ID).
  4. If no entry exists, the MCS has failed to process this device as part of the group.
  5. Verify the status in the
    ssrv2devicegroup
    table for the device ID. If it is
    modified
    , you can wait for it to be picked up. If it is
    OK
    , there is some issue. Collect mon_config_service.log from the mcs probe folder and update the device state to modified for a workaround.
  6. If an entry exists and is in the new state, the device is yet to be processed.
  7. If an entry exists and is in the error state, review the
    ssrv2audittrail
    table with the policy ID for error details.
No Alarm Generation and No Alarm Policy Information in plugin_metric.cfg
This issue can happen because of various reasons.
Reason: The template is not an enhanced template or the templates are not set to true for production.
To troubleshoot:
  1. Query the
    ssrv2template
    table:
    select * from ssrv2template where probe='<probename>' and type='policy'
  2. Ensure you have templates of the type policy and the production flag is set to 1.
Reason: If the template is proper, check whether the template is set to work for proper OS and nimbus types.
To troubleshoot:
  1. Query the
    ssrv2packagetemplate
    table (template, nimbus_type, os_type) by template ID.
    select * from ssrv2packagetemplate where template in (select templateId from ssrv2template where probe='<probename>')
  2. Verify the nimbus type and OS has the following values:
    • nimbus_type: 0 (This indicates that alarm policy should work for both VM and robot.)
    • os_type: null (This indicates support for all the operating systems.)
Reason: If the template exists with all proper settings, query
ssrv2policytargetstatus
with the policyID along with groupID\device ID to check the retry count and status.
To troubleshoot:
  1. If no entry exists in the table, check policy_management.log at
    <Drive>\Nimsoft\probes\service\wasp
    . Search with the policy ID to see any error.
  2. If an entry exists with the state as ‘NEW’ and retry count as 0, review the MCS logs at 
    <Drive>\Nimsoft\probes\service\mon_config_service\mon_config_service.log
    .
  3. If the MCS logs have connectivity issues or lock issues with the database, restart the mcs probe.
  4. If an entry exists in new and the retry count is greater than 0, check the
    ssrv2audittrail
    table with (objecttype: POLICY, objectId: <policy_id>). The
    objectvalue
    column must contain the reason for the failure.
    select * from ssrv2policytargetstatus where policy_id=<policyID> and  group_id=<groupID>
    select * from ssrv2policytargetstatus where policy_id=<policyID> and  cs_id=<csID>
Default Alarm Policy Conditions Not Coming Correctly
After you migrate a legacy profile to an enhanced profile and find that the alarm policy conditions are not coming as expected, verify the following points:
  • Review the threshold type value:
    select * from ssrv2configvalue where profile=<enhanced_profileId> and variable like '%thresholds_table-thresholdtype-%' and value!='None'
    The value must be “static”, "dyn_scalar", "dyn_pct", or "dyn_stddev".
  • Review the operator value:
    select * from ssrv2configvalue where profile=< enhanced_profileId > and variable like '% thresholds_table-operator-%'
    The value must be "NE", "L", "LE", "EQ", "G", or "GE".
  • Review the severity value:
    select * from ssrv2configvalue where profile=<enhanced_profileId> and variable like '%thresholds_table-severity-%'
    The value must be 1,2,3,4,or 5.