Monitoring Information

Every monitoring push contains the information as shown in the following table:
cad1215
Every monitoring push contains the information as shown in the following table:
Monitoring Object Name:
 
dxserver-monitor
Name
Value Type
Format
Description
host-name
 
String
 
Name of host where the DSA that provides information resides
dsa-name
String
 
Name of DSA that provides information
time
String
CCYYMMDD.HHMMSS.mmm
Time when the DSA sent the message
message-id
Number
>=0
Sequence number to identify each message uniquely sent by DSA
{alarm}
 
This information is explained in detail in the following sections.
{event}
 
This information is explained in detail in the following sections.
log
{log}
 
This information is explained in detail in the following sections.
{stats}
 
This information is explained in detail in the following sections.
{cache}
 
This information is explained in detail in the following sections.
{multiwrite}
 
This information is explained in detail in the following sections.
{dsastats}
 
This information is explained in detail in the following sections.
Alarm Information
Alarm information is sent each time an alarm message is written to the alarm log depending on the configured type. The alarm log captures a large number of critical events in the lifecycle of a DSA.
  • Alarm Object Name:
     alarm
  • Trigger:
     Alarm message written to log file
The alarm message contains the following information as shown in this table:
Name
Value Type
Format
Description
id
String
DSA_cnnnn
Unique alarm message identifier
type
 
String
Enumeration:
critical
caution
information
Severity of alarm
message
String
 
Text describing the alarm event that occurred
Alarm Format Example
This example is an informational alarm message sent when you stop the DSA.
 { "dxserver-monitor": {
        "host-name": "hostname.com",
        "dsa-name": "data1",
        "time": "20141205.165412.212",
        "alarm": {
          "id": "DSA_I1240",
          "type": "information",
          "message": "DSA shutting down"
        }
      }
    }
Event Information
While the DSA is running, a number of key events can be detected. These events are useful from an auditing perspective or to detect problems that require immediate attention.
  • Event Object Name:
     event
  • Trigger:
     Configured event detected by the DSA
The event message contains the following information as shown in this table:
Name
Value Type
Format
Description
type
 
String
Enumeration:
auth-failure
account-susp
op-error
mw-error
Type of event that occurred
message
String
 
Text describing the event that occurred
Event Format Example
This example is when a bind to the DSA occurs with invalid credentials. If this occurs frequently, it may indicate a dictionary-based attack.
    { "dxserver-monitor": {
        "host-name": "hostname.com",
        "dsa-name": "data1",
        "time": "20141205.165412.212",
        "event" : {
          "type": "auth-failure",
          "message": "cn=justin,ou=users,o=ca,c=au 123.123.123.123"
        }
      }
    }
Logs Information
While the DSA is running, a number of logs are written to the file system. These logs can be directory to the monitoring address.
  • Log Object Name:
     logs
  • Trigger:
     When operations are processed by the DSA
The logs message contains the following information as shown in this table:
Name
Value Type
Format
Description
type
 
String
Enumeration:
query-log
update-log
Type of event being logged by the DSA
message
String
 
Text describing the log event that occurred
External event monitoring is independent of the CA Directory logging to the file system.
Log Format Example
This example is a successful bind request being logged.
   { "dxserver-monitor": {
        "host-name": "hostname.com",
        "dsa-name": "router",
        "time": "20150223.115515.695",
        "log" : {
          "type": "query-log",
          "message": "20150223.115515.695 0.11 BIND 10.129.174.81 dn=\"cn=justin,ou=users,o=ca,c=au\""
        }
      }
    }
 
    { "dxserver-monitor": {
        "host-name": "hostname.com",
        "dsa-name": "router",
        "time": "20150223.171428.225",
        "log" : {
          "type": "query-log",
          "message": "20150223.171428.225 0.11 RESULT success 1 entries 16 msecs"
        }
      }   
    }
Statistics Information
The DSA keeps running counts of various operations received and other information as shown in the following table. This information is reset when a DSA is restarted. This count keeps increasing until the values wrap back to zero when unsigned MAX_INT (32-bit) is reached.
If periodic snapshots are taken, then this information provides a reasonable measure of how the DSA is performing over time. The delta of snapshots indicates the amount of work a DSA has performed in that period.
  • Statistics Object Name:
     stats
  • Trigger:
     If 
    push-interval
     is configured, this object is used, else, a message is sent every 60 seconds.
Name
Value Type
Format
Description
anonymous-binds
Number
>=0
Anonymous binds processed
simple-binds
Number
>=0
Username/password binds processed
strong-binds
Number
>=0
Certificate authenticated binds processed
bind-security-errors
Number
>=0
Binds refused due to invalid credentials
total-operations
Number
>=0
Total count of operations processed
compare-entry-operations
Number
>=0
Total number of compare operations processed
add-entry-operations
Number
>=0
Total number of add entry operations processed
remove-entry-operations
Number
>=0
Total number of remove entry operations processed
modify-entry-operations
Number
>=0
Total number of modify entry operations processed
rename-entry-operations
Number
>=0
Total number of rename entry operations processed
list-operations
Number
>=0
Total number of list entry operations processed (one level searches for objectClass present)
search-operations
Number
>=0
Total number of search operations processed
one-level-search-operations
Number
>=0
Total number of search operations with a scope of one-level processed
whole-subtree-searches
Number
>=0
Total number of whole subtree search operations processed
security-errors
Number
>=0
Total number of security errors that have occurred
operation-errors
Number
>=0
Total number of failed operations
Statistics Format Example
{ "dxserver-monitor": {
        "host-name": "hostname.com",
        "dsa-name": "router",
        "time": "20150223.214700.014",
        "stats" : {
          "anonymous-binds": 0,
          "simple-binds": 1,
          "strong-binds": 0,
          "bind-security-errors": 0,
          "total-operations": 7,
          "compare-entry-operations": 0,
          "add-entry-operations": 0,
          "remove-entry-operations": 0,
          "modify-entry-operations": 1,
          "rename-entry-operations": 0,
          "list-operations": 5,
          "search-operations": 1,
          "one-level-searches": 1,
          "whole-subtree-searches": 0,
          "security-errors": 0,
          "operation-errors": 0
        }
      }
    }
Cache Information
The DSA uses a high speed cache to optimize performance. Since r12, all information is now cached (DXgrid) and this event provides a snapshot of the current state of the DXgrid DSA cache.
If set on router DSAs, all counters are set to zero and the status is disabled. For this reason, it is more efficient to set this option on data DSAs only.
  • Statistics Object Name:
     cache
  • Trigger:
     If 
    push-interval
     is configured, this object is used, else, a message is sent every 60 seconds.
The cache message contains the following information as shown in this table:
Name
Value Type
Format
Description
status
String
Enumeration:
ok
building
disabled
insane
 
Cache is functioning normally
Cache being loaded from DXgrid db file
Cache turned off
Should not get here
size
Number
>=0
Memory used (to the nearest MB) to cache the entries and build indexes
search-hits
Number
>=0
How many searches the cache has serviced
sequential-scans
Number
>=0
How many searches caused the cache to sequentially scan all entries. These searches are inefficient and should be monitored.
entries
Number
>=0
Number of entries serviced by the cache
file-size
Number
>=0
The configured size of the DXgrid db file
used-bytes
Number
>=0
Number of bytes used to store data in the DXgrid db file. The file utilization percentage can be calculated using this formula:
used-bytes / file-size * 100.
reclaimable-bytes
Number
>=0
Number of bytes from deleted entries or values that may be reclaimed by subsequent updates
Cache Format Example
    { "dxserver-monitor": {
        "host-name": "hostname.com",
        "dsa-name": "data1",
        "time": "20150223.214700.014",
        "cache" : {
          "status": "ok",
          "size": 6,
          "search-hits": 0,
          "sequential-scans": 0,
          "entries": 7,
          "file-size": 10,
          "used-bytes": 1,
          "reclaimable-bytes": 1
        }
      }
    }
Multiwrite Information
When a DSA is part of a replicating set of DSAs, the multiwrite event keeps track of each multiwrite peer DSA it is servicing. A DSA may have one or more peers and the multiwrite event provides a separate status line for each peer.
No events are sent if a DSA is not replicating. If there are many multiwrite DSAs, then each DSA sends a multiwrite event for each peer in its replicating set. This may trigger to a large amount of traffic. For this reason, increase the push interval to reduce monitoring traffic.
  • Statistics Object Name:
     multiwrite
  • Trigger:
     If 
    push-interval
     is configured, this object is used, else, a message is sent every 60 seconds.
Name
Value Type
Format
Description
dsa-name
String
 
Name of remote multiwrite peer DSA.
queue-length
Number
>=0
Number of updates that have been applied locally and must be sent to the multwrite peer DSA.
This value increasing over time indicates a replication bottleneck that needs investigation, especially if MW-DISP is enabled.
status
Number
Enumeration:
unknown
 
ok
Replicating normally to
dsa-name
failed
dsa-name
cannot be contacted, will try again in 60 seconds
failed-no-remote-dsa
Replication has failed as
dsa-name
is removed from configuration
serviced-by-hub
Not replicating to
dsa-name
as handled by hub in that DSA Multiwrite group
recovering
Replication to
dsa-name
has failed, will use MW-DISP for recovery
disp-failed
dsa-name is in the process of recovering using Multiwrite
waiting-for-disp
MW-DISP initialization in progress
queue-purged
Multiwrite queue size exceeded
failed-update-sent
Attempting to reconnect to
dsa-name
pending-remote
Number
>=0
Count of updates that have been sent to replicating peer DSA
confirmed-local
Number
>=0
Count of updates in queue that clients are not waiting for confirmation on. The updates have either been confirmed or are replicating asynchronously.
Multiwrite Format Example
DSA mw1 has 2 multiwrite peer DSAs (mw2 & mw3), therefore, two messages are sent.
    { "dxserver-monitor": {
        "host-name": "hostname.com",
        "dsa-name": "mw1",
        "time": "20150223.214700.014",
        "multiwrite" : {
          "dsa-name": "mw2",
          "queue-length": 0,
          "status": "ok",
          "pending-remote": 10,
          "confirmed-local": 10
        }
      }
    }
 
    { "dxserver-monitor": {
        "host-name": "hostname.com",
        "dsa-name": "mw1",
        "time": "20150223.214700.014",
        "multiwrite" : {
          "dsa-name": "mw3",
          "queue-length": 1000,
          "status": "failed",
          "pending-remote": 0,
          "confirmed-local": 0
 
        }
      }
    }
DSA Internal Statistics Information
The DSA keeps track of low-level internal statistical information that can be useful in tracking operational or performance problems.
Some information is reset when a DSA is restarted and keeps increasing until the values wrap back to zero when unsigned MAX_INT (32 bit) is reached. Other information is reset after each snapshot. For this reason the push-interval is not supported for this monitoring event.
  • Statistics Object Name:
     dsastats
  • Trigger:
     Message sent every 60 seconds
Name
Value Type
Format
Description
associations
Number
>=0
Number of active connections to the DSA
nil-credit
Number
>=0
Number of times since the last message when an operation processing got delayed as the credit limit was reached. This value is reset every 60 seconds.
queued-ops
Number
>=0
Number of operations the DSA is processing
queued-remote-ops
Number
>=0
Number of operations that have been sent to other DSAs for processing
ops-processed
Number
>=0
Number of operations that have been processed since the last message. This value is reset every 60 seconds.
entries-returned
Number
>=0
Number of entries returned by the DSA since the last message. This value is reset every 60 seconds.
mw-queue
Number
>=0
The total number of queued updates in all multiwrite queues
mw-replicating
Number
>=0
The total number of updates that are pending a response from remote peer DSAs
active
Number
>=0
Approximately the percentage of active threads over the last minute. This value is reset every 60 seconds.
memory-trees
Number
>=0
The number of internal memory blocks in use
memory-usage
Number
>=0
Total amount of memory DSA has requested from the operating system
mallocs
Number
>=0
The number of times malloc was called since the last message. The DSA reuses memory so over time the number of malloc calls should reduce. If these calls do not reduce, the DSA is not performing at full efficiency as requesting memory from the OS incurs a performance cost.
This value is reset every 60 seconds.
buffers
Number
>=0
The number of transport buffers in the DSA. These buffers are used when sending requests and responses to other DSAs or clients. If the number of transport buffers grows this can indicate that the receiver is not keeping up with the load generated and may indicate a bottleneck
free-buffers
Number
>=0
The number of transport buffers that can be reused
selects
Number
>=0
The number of times select() was called since the last message. This object is used to listen for network events (requests or disconnects). This value is reset every 60 seconds.
write-selects
Number
>=0
The number of times select() returned a write event. These events occur when the DSA tries to send a request or response and the attempt is blocked by the receiver. The DSA needs to wait for the other end to become writable. If these are frequent then the target may have a performance issue.
This value is reset every 60 seconds.
thread-count
Number
>=0
Number of threads available in the pool to work. The DSA no longer has backend threads, so these counters that appear in SNMP have been dropped.
thread-mean
Number
>=0
Average number of threads active from the available threads
DSA Internal Statistics Format Example
    { "dxserver-monitor": {
        "host-name": "hostname.com",
        "dsa-name": "data",
        "time": "20150223.214700.014",
        "dsastats" : {
          "associations": 2,
          "nil-credit": 0,
          "queued-ops": 1,
          "queued-remote-ops": 1,
          "ops-processed": 10,
          "entries-returned": 34,
          "mw-queue": 0,
          "mw-not-sent": 0,
          "busy": 0,
          "memory-trees": 25,
          "memory-usage": 6,
          "mallocs": 0,
          "buffers": 10,
          "free-buffers": 0,
          "selects": 3850,
          "write-selects": 0,
          "thread-count": 8,
          "thread-mean": 0
        }
      }
    }
Response Format
When the DSA sends a request through REST, a response from the server must be sent back to the DSA. The DSA does not enforce it, but can be used to indicate an issue with processing the response and aids in diagnosing server side monitoring issues. It may be more efficient for the endpoint to not send responses.
The DSA currently expects the following HTTP responses and the body of the response are ignored.
Response Messages
HTTP Status Code
Reason
403
Event failed the HTTP basic authorization check
200 or 204
Event was successfully processed