Device Correlation Troubleshooting

uimpga-ga
2
Warning!
Different versions of CA UIM use different database schema for the UIM database. This document assumes that you are using CA UIM 8.5.1 or later. If you are using a different version of CA UIM, the following items may not be valid as described in your CA UIM environment:
  • UIM database table and column names.
  • Example SQL queries to the UIM database.
Note: CA Technologies reserves the right to make schema changes in subsequent versions of CA UIM that render the content of this document invalid.
Overview
device_correlation_troubleshooting
When viewing devices in CA UIM, you might see one of two situations:
  • Duplicate devices: A device appears more than once in the inventory.
  • Merged devices: Different devices appear as a single device.
Both of these problems are caused by incorrect device correlation. Device correlation is the process of associating the device properties from a probe with the correct device in the UIM database. The first problem is a result of under-correlation, which is a failure to associate an incoming device with an existing device.  It results in a duplicate instance of the device being created.  The second problem is over-correlation, which occurs when different devices are incorrectly associated as being the same device.  It results in the information from the different devices being merged into one device.
Review the device correlation configuration documentation to learn how correlation works as well as how to configure it.   Starting with 8.51, device correlation is highly configurable.  You should be able to solve most correlation cases by customizing the configuration of the discovery_server probe.  The UIM documentation includes several correlation scenarios and how to solve them.
Why Correlation Problems Occur
The accuracy of device correlation depends on the device information provided by probes.   Ideally, every probe would provide the name, IP address and MAC address of every device in order to enable strong correlation matching.  The reality is that the amount of information per device varies by probe.  Most probes provide a name and/or an IP address.  Only a relatively small number of probes provide a MAC address.
A duplicate device can result if there is no matching information between the device perspectives from different probes.  For example, if one probe only includes a name for a device and another probe only includes an IP address for the device, then the devices won’t correlate unless a third probe includes both the name and IP address for the device, linking the three perspectives. 
Wherever possible, provide both a name and IP address for a device when configuring a probe to avoid such potential duplication.
Another source of correlation problems is mismatching origin values.  For example, if one device is being monitored from a hub with origin HubA and the same device is being monitored from a different hub with origin HubB, they won’t correlate if the devices only have an IP address or simple hostname (not FQDN).  Because an IP address or simple hostname may not be unique in an environment, origin is used as a correlation qualifier.  Simply accepting the hub name as the default origin can lead to the correlation mismatch described here. 
Assign origins strategically to aid correlation.
Origins and Enriched Origins
Probes publish an origin with each device. The published origin typically matches the origin of the robot where the probe is running. The robot origin defaults to its hub’s origin but can be overridden at the robot. The hub origin defaults to the hub name but can be overridden at the hub. The published origin is sometimes referred to as the “bus origin.”
Enriched origins are the overridden origins located in the S_QOS_DATA table. The qos_processor probe is used to override the origin in S_QOS_DATA using custom scripts. However, in some deployments, S_QOS_DATA can be modified directly without using the qos_processor probe. By default, the QoS origin is the same as the robot origin.
The discovery server detects enriched origin changes in S_QOS_DATA and applies them to the affected devices. An enriched origin is the origin value from S_QOS_DATA where modifier != ‘nimsoft’ and origin != nim_origin.  The modifier, origin, and nim_origin columns need to be non-null in order for the enriched origin to be picked up by the discovery server.
For correlation rules that are qualified by origin, a matching origin (either from the Origin or EnrichedOrigin properties) is needed for two devices to correlate. 
Assign hub and/or robot origins strategically to facilitate correlation, or set up enriched origins to help bind the devices together.
The default origin of hub name is often problematic.  It would be less problematic if every hub and robot defaulted to the same origin (e.g., the UIM domain name) and the origin is only overridden for a specific purpose.  This would handle a single tenant environment with no overlapping names or IP addresses. 
If there are overlapping IP addresses or names across locations, assign a different origin for each location.  Assign every hub and robot in that location the same origin.
Use the same strategy for multiple tenants. 
Assign a different origin to each tenant.
Every hub and robot for that tenant should be assigned the same origin unless there is a need to further subdivide the tenant.  For example, if the tenant has multiple locations with overlapping IP addresses or names, then assign different origins to each tenant location.  A hierarchical naming pattern for origins can also help:for example, “CustomerA_IL_Chicago” specifies both the tenant and location.
See the existing documentation on configuring qos_processor for more information about creating enriched origins for monitored devices.
In addition to strategically assigning hub/robot origins and setting up enriched origins, the discovery server supports a “treat as equal values” feature that can be used to define sets of origins that should be treated as equal for correlation purposes.  See the CA UIM Device Correlation Configuration documentation for more details on this.
Troubleshooting Tools
Discovery server 8.51 and later includes the following tools to help with troubleshooting and resolving correlation cases:
  • CM_DEVICE_CORRELATION_HISTORY database table to view a history of correlation results.
  • Discovery server commands/callbacks:
    • query_devices_by_cs_ids
    • dry_run_correlate_devices_by_cs_ids
    • reimport_devices_by_cs_ids
CM_DEVICE_CORRELATION_HISTORY
You can query the CM_DEVICE_CORRELATION_HISTORY table to get the correlation results for a device. See the device correlation configuration documentation for the details on this table.  Note the following:
  • Discovery server automatically deletes entries older than 14 days.
  • Configurable via device_correlation_history_delete_older_than_time config parameter
  • Also run manually with the “delete_device_correlation_history_older_than” callback
Because the discovery server automatically deletes old entries, devices that have not been processed recently may not have any entries in CM_DEVICE_CORRELATION_HISTORY.
query_devices_by_cs_ids
This discovery server probe utility command provides details needed to troubleshoot device correlation cases. Note the following:
  • The command accepts a comma-separated list of one or more CM_COMPUTER_SYSTEM cs_id values.
  • Query results from following tables for the specified cs_id’s are saved to CSV files:
    • CM_COMPUTER_SYSTEM
    • CM_COMPUTER_SYSTEM_ATTR
    • CM_DEVICE, CM_DEVICE_ATTRIBUTE
    • CM_DEVICE_CORRELATION_HISTORY
  • The CSV files are saved to query_devices_by_cs_ids_<datecode>.zip in the discovery_server directory.
dry_run_correlate_devices_by_cs_ids
You can use this discovery server probe utility command to try out changes to the device correlation configuration as well as troubleshoot device correlation cases. Note the following:
  • The command accepts a comma-separated list of one or more CM_COMPUTER_SYSTEM cs_id values.
  • The command also accepts a CFG file name with a customized correlation configuration.  If a CFG file isn’t specified, the command uses the current discovery server configuration.
  • The discovery server will correlate the devices associated with the specified cs_id’s.  It will only correlate the devices and not persist the results in order to make the correlation operation non-destructive.
  • The equivalent of what is saved in CM_DEVICE_CORRELATION_HISTORY is saved to a correlation_results.csv file.
  • So that you can compare current to previous correlation results, the last correlation result from CM_DEVICE_CORRELATION_HISTORY for the same set of devices is saved to previous_correlation_results.csv file with the time that it was processed.
  • So that you can perform more detailed analysis, query results from CM_COMPUTER_SYSTEM, CM_COMPUTER_SYSTEM_ATTR, CM_DEVICE and CM_DEVICE_ATTRIBUTE for the specified cs_id’s are saved to CSV files.
  • All of the above files, plus discovery_server.cfg and the optional CFG file, are saved to dry_run_correlate_devices_by_cs_ids_<datecode>.zip in the discovery_server directory.
Note that cs_id will be blank in the correlation_results.csv file.  Because the “dry_run_correlate_devices_by_cs_ids” command does not modify the database to save the correlation results, the value of cs_id is indeterminate.  All that can be determined in the dry run is the target device if a match is found as well as the match_type (rule name) and match keys/values that drove the correlation.
reimport_devices_by_cs_ids
This discovery server probe utility command can be used in conjunction with the "dry_run_correlate_devices_by_cs_ids" callback. When configuration changes have been verified to produce the intended results, they can be deployed to discovery_server.cfg and then the command “reimport_devices_by_cs_ids” can be used to re-correlate the devices using the new configuration. Note the following:
  • The command accepts a comma-separated list of one or more CM_COMPUTER_SYSTEM cs_id values.
  • The command reimports devices with their current data in the database.  Reimporting devices goes through the full import process: correlation, reconciliation, and persistence.
  • The callback doesn't immediately reimport the devices.  It publishes the devices to the discovery server's internal queue for processing.
Troubleshooting Example
Let’s say there are two devices in the UIM inventory with the name “foobar” that appear to be duplicates.
Query CM_COMPUTER_SYSTEM to determine the cs_id values for the “foobar” devices:
select * from CM_COMPUTER_SYSTEM where name='foobar'
The cs_id’s are determined to be 314851 and 314852.
Now, open the probe_utility on the discovery server, select the “query_devices_by_cs_ids” command, set cs_ids to “314851,314852,” and run the command.   To see the results, retrieve the newest query_devices_by_cs_ids_<datecode>.zip file in the discovery_server directory from the system where discovery server is installed.
Looking in the cm_device_correlation_history.csv file, you see that the “foobar” devices are not finding a match (target_dev_id is empty and match_type is ‘none’):
source_dev_id
target_dev_id
cs_id
time_processed
reason
match_type
D268FAC8073818534EF7B6463F201267A
314851
2/2/17 15:16
0
none
DB26684F675CD005DEBC4E387593590D2
314851
2/2/17 15:16
0
none
Looking in the cm_device_attribute.csv file, you see the following:
dev_id
cs_id
dev_ip
dev_name
probe_name
dev_attr_key
dev_attr_value
D268FAC8073818534EF7B6463F201267A
314851
foobar
probe1
CorrrelationNames
foobar
D268FAC8073818534EF7B6463F201267A
314851
foobar
probe1
Origin
HubA
DB26684F675CD005DEBC4E387593590D2
314852
foobar
probe2
CorrrelationNames
foobar
DB26684F675CD005DEBC4E387593590D2
314852
foobar
probe2
Origin
HubB
The devices don’t include an IP address.  The only common correlation property between the two devices is the name “foobar.”  The devices don’t correlate because their origins don’t match.
Looking over possible solutions in the device correlation configuration documentation, you decide to use the <treat_as_equal_values> feature to specify that origins HubA and HubB should be treated as equal for correlation purposes.
You copy discovery_server.cfg to discovery_server_test.cfg and add a <treat_as_equal_values> section to <Origins> in the discovery_server_test.cfg.
Now, run the “dry_run_correlate_devices_by_cs_ids” discovery server command.  Set cs_ids to “314851,314852” and optional_cfg_file to “discovery_server_test.cfg.”  Get the newest dry_run_correlate_devices_by_cs_ids _<datecode>.zip file from the discovery_server directory to check the results.
Looking in the correlation_results.csv file, you see that the devices now correlate:
source_dev_id
target_dev_id
cs_id
time_processed
reason
match_type
D268FAC8073818534EF7B6463F201267A
DB26684F675CD005DEBC4E387593590D2
2/2/17 15:37
0
name_origin
DB26684F675CD005DEBC4E387593590D2
D268FAC8073818534EF7B6463F201267A
2/2/17 15:37
0
name_origin
Having verified the configuration change, copy discovery_server_test.cfg to discovery_server.cfg and restart the discovery server.
To re-correlate the “foobar” devices with the new configuration, you can use the “reimport_devices_by_cs_ids” discovery server command and set cs_ids to “314851,314852”.