data_engine Troubleshooting

This article provide troubleshooting information for issues you might encounter while upgrading, configuring, or using different version of the data_engine probe.
uimpga-ga
This article provide troubleshooting information for issues you might encounter while upgrading, configuring, or using different version of the data_engine probe.
Correlation Issues with Metric Data
Symptom:
  • I am having correlation issues with my metric data.
  • I can see my metric data in PRD, but it does not appear in USM.
Solution:
As of CA UIM 8.4 SP2, you can fix this issue by setting the
update_metric_id
key to yes. The update_metric_id key instructs the data_engine to update the metric_id when it is changed. By default, the key is set to no, or not present in the configuration.
Follow these steps:
  1. Locate the data_engine probe in either Admin Console or Infrastructure Manager.
  2. Open the Raw Configure menu.
  3. Enter a new key-value pair:
    • Key name -
      update_metric_id
    • Value
      - yes
      If the key already exists, simply change the value of the key from no to yes.
  4. Save your changes, the data_engine restarts.
If your CA UIM environment is large, it might take up to an hour to see updates in USM.
Upgrading to a Newer Version of CA UIM with an Oracle Database
Symptom:
(Oracle databases only) My Oracle database is unpartitioned before I upgrade to a newer version of CA UIM. After the upgrade is complete, I do not partition my Oracle database before I start to use CA UIM. When I attempt to perform data maintenance, it fails and an error similar to the following appears in the data_engine log file:
[140713318672128] de: Data Maintenance - RC: -1000 [ORA-20008: spn_de_DataMaint_DeleteOldData: Error: -20005 ORA-20005: spn_de_CreateBNTableIndex: Error: -1408 ORA-01408: such column list already indexed], Purge: RN_QOS_DATA_0021, Error, Raw Rows:0, Time used: 204 ms ~= (0 seconds) (QOS_ORACLE_CHECK_DBALIVE)
Solution:
The index names for the BN and RN tables used by older versions of data_engine (versions 7.6 and earlier) do not match the index names for the index tables used by the data_engine v8.0 and later. If you partition the Oracle database after an upgrade, the index names for the BN and RN tables are updated. However, if you do not partition the Oracle database after upgrading to a newer version of CA UIM, the old index names remain for the BN and RN tables and data maintenance fails. To correct this issue, run the following stored procedures on your Oracle database.
Script to change IDX2 to IDX0 for all the RN tables:
set serveroutput on size 30000; declare lOldIndex varchar2(80); lNewIndex varchar2(80); lCmd varchar2(255); cursor lCursorIndexes is select INDEX_NAME from user_indexes where index_name like '%_IDX2' AND table_name like 'RN_QOS_DATA_%'; begin for lIdx in lCursorIndexes?loop lOldIndex := lIdx.INDEX_NAME; lNewIndex := lOldIndex; dbms_output.put_line(lNewIndex); lNewIndex := replace(lNewIndex, '_IDX2','_IDX0'); dbms_output.put_line(lNewIndex); lCmd := 'ALTER INDEX ' || lOldIndex || ' RENAME TO ' ||??lNewIndex; dbms_output.put_line(lCmd); EXECUTE IMMEDIATE lCmd; end loop; end;
Script to change the IDX2 to IDX1 for all the BN tables:
set serveroutput on size 30000; declare lOldIndex varchar2(80); lNewIndex varchar2(80); lCmd varchar2(255); cursor lCursorIndexes is select INDEX_NAME from user_indexes where index_name like '%_IDX2' AND table_name like 'BN_QOS_DATA_%'; begin for lIdx in lCursorIndexes?loop lOldIndex := lIdx.INDEX_NAME; lNewIndex := lOldIndex; dbms_output.put_line(lNewIndex); lNewIndex := replace(lNewIndex, '_IDX2','_IDX1'); dbms_output.put_line(lNewIndex); lCmd := 'ALTER INDEX ' || lOldIndex || ' RENAME TO ' ||??lNewIndex; dbms_output.put_line(lCmd); EXECUTE IMMEDIATE lCmd; end loop; end;
Data Maintenance on an Oracle Database Doesn't Work After Upgrading From NMS 7.6
Symptom:
(Oracle databases only) I upgraded from NMS 7.6 to a newer version of CA UIM. When I attempt to partition my Oracle database, data maintenance is not performed and I receive the following error in the data_engine log file:
  • ORA-01031: insufficient privileges
Solution:
Execute the
grant
command as sysdba before or after upgrading from NMS 7.6 to CA UIM 8.0 or later.
Follow these steps:
  1. Log in to the database server as SYSDBA using a tool such as Oracle SQL Developer and execute:
    grant execute on dbms_redefinition to <db_owner>; grant alter any table to <db_owner>; grant select any table to <db_owner>; grant create any table to <db_owner>; grant drop any table to <db_owner>; grant lock any table to <db_owner>;
  2. Recompile the stored procedures in the CA UIM schema.
  3. Run data maintenance again.
    When you do an upgrade from a fresh install of CA UIM 8.0 or later, the appropriate user permissions to an Oracle database are granted and it is not necessary to run the
    grant
    command.
S_QOS_SNAPSHOT Database Table Missing Updates
Symptom:
It looks like my S_QOS_SNAPSHOT database table is not updating properly. For example, when I look at the table, I see no new values added to the table or there are null values for new QoS metrics.
Solution:
If you have a smaller system that is monitoring less than 20 QoS metrics, change the value for the thread_count_insert parameter to zero in Raw Configure.
Viewing the Log File
Advanced users may find it helpful to view the log file. To view the log file, click the data_engine probe and select
View Log
. You also can modify the log file settings so that it retains more data for troubleshooting.
Corrupted QoS Definition Values
Symptom:
I see corrupted QoS Definition values in QoS data within reports or dashboards, for example user-defined units such as degrees F or C, watts, etc. being labeled “variant". Customized QoS definition units have been incorrectly replaced (overridden) by data_engine version 7.85 (NM Server 5.60) or data_engine 7.86 (NM Server 5.61).
Solution:
Versions 7.87 through 7.90 of data_engine include a patch utility to recover and restore customized QoS definition units incorrectly replaced (overridden) by data_engine version 7.85 (NM Server 5.60) or data_engine 7.86 (NM Server 5.61). This patch utility uses a conditional override approach that corrects the issue.
Follow these steps:
  1. From a command prompt, navigate to the directory that contains the utility:
    (Unix) cd <UIM install location>/probes/slm/data_engine/tools (Windows) CD <UIM install location>\probes\slm\data_engine\tools
  2. Execute the tool in “report” mode (-r flag set) with the java version installed with UIM Server (formerly name NM Server):
    (Unix) ../../../../jre/jre1.6.0_24/bin/java -jar qos_def_unit_repair_kit.jar -r (Windows)..\..\..\..\jre\jre1.6.0_24\bin\java -jar qos_def_unit_repair_kit.jar -r
  3. The patch utility scans the S_QOS_DEF_SKIP_UNIT table in the database and finds QoS Definitions that are suspected to be corrupt.
    The S_QOS_DEF_SKIP_UNIT table holds QoS definition values that should not override what is sent by a QoS probe. This table is pre-populated with three values: variant, none, and user defined.
    If you have defined additional custom values, which have been incorrectly overridden by a previous data_engine version, add these values as new rows to the S_QOS_DEF_SKIP_UNIT table prior to step 5 below, so that the patch utility will find, report, and fix them as well. (Use standard database management tools to connect to the NIS database and add the rows and new values to the S_QOS_DEF_SKIP_UNIT table.)
    The report generated by the utility shows corrupt QoS Definitions (if any) and the probe or probes associated with that data. Here is an excerpt from an example report:
    List current problems...
    S_QOS_DEFINITION { name=QOS_CPUSAMPLECOUNT, qosDefId=12, group=QOS_VMWARE, unit=none,unitShort=sc }
    S_QOS_DATA { source=esxiqa1.i9.x, target=CPU sample count, origin=w2k8-vm0hub, host=10.0.0.1, robot=w2k8-vm0, probe=vmware }
    S_QOS_DEFINITION { name=QOS_SNMP_VARIABLE, qosDefId=11, group=QOS_SNMP_VARIABLE, unit=variant, unitShort=value }
    S_QOS_DATA { source=w2k8-vm0, target=interfaces.ifTable.ifREntry.ifOutOctets.1, origin=w2k8-vm0hub, host=10.0.0.1, robot=w2k8-vm0, probe=snmpget }
    S_QOS_DATA { source=w2k8-vm0, target=system.sysServices.0, origin=w2k8-vm0hub,host=10.0.0.1, robot=w2k8-vm0, probe=snmpget }
    End of current problems list.
    Running in report-only mode.
  4. Run the utility without the -r flag set to make repairs:
    (Unix) ../../../../jre/jre1.6.0_24/bin/java –jar qos_def_unit_repair_kit.jar
    (Windows)..\..\..\..\jre\jre1.6.0_24\bin\java –jar qos_def_unit_repair_kit.jar
    The utility subscribes to the UIM message bus to receive QoS Definitions and corrects any in the database that have been incorrectly replaced.
  5. Restart each of the probes listed in the report. In the example above the vmware and snmpget probes would be restarted so that the correct units are received. Probes are identified at the end of each S_QOS_DATA listing in the report. For example:
    S_QOS_DATA { source=esxiqa1.i9.x, target=CPU sample count, origin=w2k8-vm0hub, host=10.0.0.1, robot=w2k8-vm0, probe=vmware }
Once the patch utility has repaired all QoS Definitions it does not need to be run again and no other action is required.
Upgrading data_engine Taking Time
Symptom:
I am upgrading the data_engine probe from a previous version to the one that is available with CA UIM 9.0.2. However, the upgrade process is taking more time than it usually takes.
Solution:
For data_engine released with versions prior to CA UIM 9.0.2, the S_QOS_DATA table contains the object data for each unique combination of QoS, source, and target attributes. It does not include the origin value in identifying the unique combination. However, the data_engine probe in CA UIM 9.0.2 has been enhanced to use the unique combination of QoS, source, target, and origin attributes. Therefore, when you upgrade from a previous version to CA UIM 9.0.2, the upgrade process might take some time because data_engine uses the origin value to recalculate the unique combination. This recalculation results in updating the existing values in the checksum column of the S_QOS_DATA table. The data_engine probe runs database scripts during the upgrade process to update the existing records. For example, for Microsoft SQL Server and MySQL, the respective scripts take approximately 20-30 seconds to update 7,00,000 records in the S_QOS_DATA table. In this case, the computer has 16 GB of memory and 8 cores processor.
Out-of-date Information in the Quality of Service Type Status Table
Symptom:
I see out-of-date information in the Quality of Service Type Status Table.
Solution:
Table statistics are collected by database software to create the best execution plan for a query. Some examples of collected statistics include:
  • Rows that are stored in a table
  • Available indexes
  • How many pages store the rows
The data_engine probe uses these table statistics to generate the Quality of Service Type Status table. Table statistics can be manually updated if they become out of date.
The procedure for updating the table statistics is different for each database vendor:
Updating table statistics significantly impacts performance on all database platforms, especially on larger databases.
Table Statistics for Oracle
To receive correct statistics in Oracle, following the instructions found at:
Table Statistics for MySQL
To receive correct statistics in MySQL, run one of the following queries on all RN_, HN_, BN_, and DN_ tables:
  • ANALYZE table RN_QOS_DATA_XXXX ESTIMATE STATISTICS; - Samples from the table are taken and stored in the data dictionary.
  • ANALYZE table RN_QOS_DATA_XXXX COMPUTE STATISTICS; - The entire table is analyzed using a full table scan and stored in the data dictionary.
Using the ANALYZE command in MySQL can be a time-consuming operation, especially for large databases. Only perform the command sporadically and do not use Automated Maintenance Tasks.
In MySQL, the ANALYZE command holds a read lock on tables, which can negatively impact database performance.
For more information, refer to the following MySQL documentation:
Table Statistics for Microsoft SQL Server
Normally table statistics are automatically managed by the SQL Server. However, this functionality does not work in some cases.
For example, if you are performing bulk inserts using the OLE DB FastLoad API, the statistics for data tables are not automatically updated. This can lead to poor performance and extra work for SQL Server.
The data_engine probe contains options that can automatically update statistics for Microsoft SQL Server. The code is in a stored procedure in the SLM database that is named ‘spn_de_UpdateStatistics’.
Updating tables statistics in Microsoft SQL server causes queries to recompile. For more information, refer to the following Microsoft SQL documentation:
The behavior can be controlled with the following variables in the raw configure menu:
Key name
Default value
Type
statistics_age
24
Time in hours. This means when the stored procedure is called, statistics that are older than this number will be updated. This value is used by the stored procedure, not data_engine itself.
If this number is set to 0 (zero), statistics will be disabled and not be run at all by the data_engine.
statistics_pattern
RN_QOS_DATA%
String pattern to which data tables will be updated.
statistics_loglevel
0
Numbers 0 to 5 used by the stored procedure when logging to tbnLogging.
statistics_time_pattern
<not set>
The scheduling string that determines when to run statistics. If this key is empty or not set, the same schedule that is defined for data management will be used. This means the statistics will be run when data_engine has finished index maintenance and data management.
If this value is specified to a different schedule, the statistics will be updated independently of when data management is scheduled.
The string will be used by the calendar scheduling library, which is used by various UIM components. It supports RFC2445. See the short example below.
Some string examples that are copied from the library help file.
/****************************************************************************************
** nimCalCreate - Creates a handle to a nimCal structure
*****************************************************************************************
** PARAMETERS:
** char *pattern - RFC2445 ,'weekly' or 'dts'
** char *start - startdate: yyyy-mm-dd [hh:mm:ss] || NULL
** : weekly format 1 or 2
**
** start = 'yyyy-mm-dd [hh:mm:ss]' will expect the 'pattern' to comply with RFC2445.
** = NULL results in setting start to 'now'
** e.g.
** h = nimCalCreate("DTSTART:19970610T090000|RRULE:FREQ=YEARLY;COUNT=10",NULL);
** h = nimCalCreate("DTSTART:19970610T090000|RRULE:FREQ=YEARLY;COUNT=10","2007-07-25");
**
** pattern = 'weekly' handles two 'start' formats:
** 1. 0123,10:00,14:00 [,NOT] (old NAS format)
** 2. MO=12:00-14:00,15:30-17:00;TU=08:00-16:00 (new, allow 8-16)
**
** h = nimCalCreate("weekly","012,10:00,14:00");
** h = nimCalCreate("weekly","MO=12:00-14:00,15:30-17:00;TU=08:00-16:00");
** h = nimCalCreate("dts","2007-08-20 08:00,2007-08-27 08:00,2007-09-03 08:00,2007-09-10"
**
** Note: Free the handle using nimCalFree.
****************************************************************************************/
You can also create a schedule in nas and use the resulting string from there or use data_engine scheduler to create a string.
Check Microsoft SQL Server Partitioning Jobs
Use the following Microsoft SQL Server statement to determine which partitioning jobs are running:
SELECT sqltext.TEXT,
req.session_id,
req.status,
req.command,
req.cpu_time,
req.total_elapsed_time
FROM sys.dm_exec_requests req
CROSS APPLY sys.dm_exec_sql_text(sql_handle) AS sqltext;
If the results display multiple "CREATE nonclustered index" statements, you have more than one partitioning job running.
To stop a partitioning job:
KILL [session_id]
Manual Deployment of data_engine Relased with CA UIM 9.0.2 Not Working
Symptom:
I manually deployed the data_engine probe released with CA UIM 9.0.2, but it is working.
Solution:
If you manually deploy the data_engine probe released with CA UIM 9.0.2, ensure that you create a .pem file, place it in the
<uim>\Nimsoft\security
folder, and add the
cryptkey=<path of the .pem file>
parameter to the robot.cfg file.
data_engine Errors After Upgrading to CA UIM 9.0.2
Symptom:
After upgrading to CA UIM 9.0.2 in an environment where the back-end database server is Microsoft SQL Server 2008 R2, data_engine may experience failures even though the upgrade appears successful.
The following log message is logged in the data_engine.log:
Oct 31 17:33:30:007 [1864] 0 de: ExecuteNoRecords - Query: exec spn_bas_SetVersion 'initialize', 'NIS_QOS_DATA',9.0200,5; go begin update qos_data set checksum = CONVERT(VARCHAR(41),HashBytes('SHA1', convert(varchar(1024), CONCAT(qos_data.qos,'#',qos_data.source,'#',qos_data.target,'#',qos_data.origin))),2) FROM S_QOS_DATA qos_data INNER JOIN S_QOS_DATA T on qos_data.table_id = T.table_id and T.checksum!=CONVERT(VARCHAR(41),HashBytes('SHA1', convert(varchar(1024), CONCAT(qos_data.qos,'#',qos_data.source,'#',qos_data.target,'#',qos_data.origin))),2); end; exec spn_bas_SetVersion 'finalize', 'NIS_QOS_DATA', null, 5; Oct 31 17:33:30:007 [1864] 0 de: [main] ExecuteNoRecords - 2 errors Oct 31 17:33:30:007 [1864] 0 de: (1) ExecuteNoRecords [Microsoft OLE DB Provider for SQL Server] Incorrect syntax near 'go'. Oct 31 17:33:30:007 [1864] 0 de: (2) ExecuteNoRecords [Microsoft OLE DB Provider for SQL Server] 'CONCAT' is not a recognized built-in function name. Oct 31 17:33:30:007 [1864] 0 de: COM Error [0x80040e14] IDispatch error #3092 - [Microsoft OLE DB Provider for SQL Server] Incorrect syntax near 'go'.
Solution:
Use the data_engine 9.02HF1 hotfix that is available at the CA Unified Infrastructure Management Hotfix Index website.
To apply the patch, follow these steps:
  1. Import the data_engine-9.02HF1.zip file into the local archive.
  2. Deploy the imported package to the primary hub robot system.
Data Maintenance Schedule Not Working Correctly
Symptom:
I scheduled a data maintenance using Infrastructure Manager, but it is not working correctly.
Solution:
If you are using Infrastructure Manager for scheduling the data maintenance, ensure that you deactivate and then activate the data_engine probe for the changes to take effect. Otherwise, you might face issues.
However, if you are using Admin Console, you do not need to deactivate and activate data_engine.