Monitoring Information
Every monitoring push contains the information as shown in the following table:
cad1215
Every monitoring push contains the information as shown in the following table:
Monitoring Object Name:
dxserver-monitor
Name
| Value Type
| Format
| Description
|
host-name | String | Name of host where the DSA that provides information resides | |
dsa-name | String | Name of DSA that provides information | |
time | String | CCYYMMDD.HHMMSS.mmm | Time when the DSA sent the message |
message-id | Number | >=0 | Sequence number to identify each message uniquely sent by DSA |
Alarm Information
Alarm information is sent each time an alarm message is written to the alarm log depending on the configured type. The alarm log captures a large number of critical events in the lifecycle of a DSA.
- Alarm Object Name:alarm
- Trigger:Alarm message written to log file
The alarm message contains the following information as shown in this table:
Name
| Value Type
| Format
| Description
|
id | String | DSA_cnnnn | Unique alarm message identifier |
type | String | Enumeration: critical caution information | Severity of alarm |
message | String | Text describing the alarm event that occurred |
Alarm Format Example
This example is an informational alarm message sent when you stop the DSA.
{ "dxserver-monitor": {"host-name": "hostname.com","dsa-name": "data1","time": "20141205.165412.212","alarm": {"id": "DSA_I1240","type": "information","message": "DSA shutting down"}}}
Event Information
While the DSA is running, a number of key events can be detected. These events are useful from an auditing perspective or to detect problems that require immediate attention.
- Event Object Name:event
- Trigger:Configured event detected by the DSA
The event message contains the following information as shown in this table:
Name
| Value Type
| Format
| Description
|
type | String | Enumeration: auth-failure account-susp op-error mw-error | Type of event that occurred |
message | String | Text describing the event that occurred |
Event Format Example
This example is when a bind to the DSA occurs with invalid credentials. If this occurs frequently, it may indicate a dictionary-based attack.
{ "dxserver-monitor": {"host-name": "hostname.com","dsa-name": "data1","time": "20141205.165412.212","event" : {"type": "auth-failure","message": "cn=justin,ou=users,o=ca,c=au 123.123.123.123"}}}
Logs Information
While the DSA is running, a number of logs are written to the file system. These logs can be directory to the monitoring address.
- Log Object Name:logs
- Trigger:When operations are processed by the DSA
The logs message contains the following information as shown in this table:
Name
| Value Type
| Format
| Description
|
type | String | Enumeration: query-log update-log | Type of event being logged by the DSA |
message | String | Text describing the log event that occurred |
External event monitoring is independent of the CA Directory logging to the file system.
Log Format Example
This example is a successful bind request being logged.
{ "dxserver-monitor": {"host-name": "hostname.com","dsa-name": "router","time": "20150223.115515.695","log" : {"type": "query-log","message": "20150223.115515.695 0.11 BIND 10.129.174.81 dn=\"cn=justin,ou=users,o=ca,c=au\""}}}{ "dxserver-monitor": {"host-name": "hostname.com","dsa-name": "router","time": "20150223.171428.225","log" : {"type": "query-log","message": "20150223.171428.225 0.11 RESULT success 1 entries 16 msecs"}}}
Statistics Information
The DSA keeps running counts of various operations received and other information as shown in the following table. This information is reset when a DSA is restarted. This count keeps increasing until the values wrap back to zero when unsigned MAX_INT (32-bit) is reached.
If periodic snapshots are taken, then this information provides a reasonable measure of how the DSA is performing over time. The delta of snapshots indicates the amount of work a DSA has performed in that period.
- Statistics Object Name:stats
- Trigger:Ifpush-intervalis configured, this object is used, else, a message is sent every 60 seconds.
Name
| Value Type
| Format
| Description
|
anonymous-binds | Number | >=0 | Anonymous binds processed |
simple-binds | Number | >=0 | Username/password binds processed |
strong-binds | Number | >=0 | Certificate authenticated binds processed |
bind-security-errors | Number | >=0 | Binds refused due to invalid credentials |
total-operations | Number | >=0 | Total count of operations processed |
compare-entry-operations | Number | >=0 | Total number of compare operations processed |
add-entry-operations | Number | >=0 | Total number of add entry operations processed |
remove-entry-operations | Number | >=0 | Total number of remove entry operations processed |
modify-entry-operations | Number | >=0 | Total number of modify entry operations processed |
rename-entry-operations | Number | >=0 | Total number of rename entry operations processed |
list-operations | Number | >=0 | Total number of list entry operations processed (one level searches for objectClass present) |
search-operations | Number | >=0 | Total number of search operations processed |
one-level-search-operations | Number | >=0 | Total number of search operations with a scope of one-level processed |
whole-subtree-searches | Number | >=0 | Total number of whole subtree search operations processed |
security-errors | Number | >=0 | Total number of security errors that have occurred |
operation-errors | Number | >=0 | Total number of failed operations |
Statistics Format Example
{ "dxserver-monitor": {"host-name": "hostname.com","dsa-name": "router","time": "20150223.214700.014","stats" : {"anonymous-binds": 0,"simple-binds": 1,"strong-binds": 0,"bind-security-errors": 0,"total-operations": 7,"compare-entry-operations": 0,"add-entry-operations": 0,"remove-entry-operations": 0,"modify-entry-operations": 1,"rename-entry-operations": 0,"list-operations": 5,"search-operations": 1,"one-level-searches": 1,"whole-subtree-searches": 0,"security-errors": 0,"operation-errors": 0}}}
Cache Information
The DSA uses a high speed cache to optimize performance. Since r12, all information is now cached (DXgrid) and this event provides a snapshot of the current state of the DXgrid DSA cache.
If set on router DSAs, all counters are set to zero and the status is disabled. For this reason, it is more efficient to set this option on data DSAs only.
- Statistics Object Name:cache
- Trigger:Ifpush-intervalis configured, this object is used, else, a message is sent every 60 seconds.
The cache message contains the following information as shown in this table:
Name
| Value Type
| Format
| Description
|
status | String | Enumeration: ok building disabled insane | Cache is functioning normally Cache being loaded from DXgrid db file Cache turned off Should not get here |
size | Number | >=0 | Memory used (to the nearest MB) to cache the entries and build indexes |
search-hits | Number | >=0 | How many searches the cache has serviced |
sequential-scans | Number | >=0 | How many searches caused the cache to sequentially scan all entries. These searches are inefficient and should be monitored. |
entries | Number | >=0 | Number of entries serviced by the cache |
file-size | Number | >=0 | The configured size of the DXgrid db file |
used-bytes | Number | >=0 | Number of bytes used to store data in the DXgrid db file. The file utilization percentage can be calculated using this formula: used-bytes / file-size * 100.
|
reclaimable-bytes | Number | >=0 | Number of bytes from deleted entries or values that may be reclaimed by subsequent updates |
Cache Format Example
{ "dxserver-monitor": {"host-name": "hostname.com","dsa-name": "data1","time": "20150223.214700.014","cache" : {"status": "ok","size": 6,"search-hits": 0,"sequential-scans": 0,"entries": 7,"file-size": 10,"used-bytes": 1,"reclaimable-bytes": 1}}}
Multiwrite Information
When a DSA is part of a replicating set of DSAs, the multiwrite event keeps track of each multiwrite peer DSA it is servicing. A DSA may have one or more peers and the multiwrite event provides a separate status line for each peer.
No events are sent if a DSA is not replicating. If there are many multiwrite DSAs, then each DSA sends a multiwrite event for each peer in its replicating set. This may trigger to a large amount of traffic. For this reason, increase the push interval to reduce monitoring traffic.
- Statistics Object Name:multiwrite
- Trigger:Ifpush-intervalis configured, this object is used, else, a message is sent every 60 seconds.
Name
| Value Type
| Format
| Description
|
dsa-name | String | Name of remote multiwrite peer DSA. | |
queue-length | Number | >=0 | Number of updates that have been applied locally and must be sent to the multwrite peer DSA. This value increasing over time indicates a replication bottleneck that needs investigation, especially if MW-DISP is enabled. |
status | Number | Enumeration: | |
unknown
| |||
ok
| Replicating normally to dsa-name
| ||
failed
| dsa-name cannot be contacted, will try again in 60 seconds | ||
failed-no-remote-dsa
| Replication has failed as dsa-name is removed from configuration | ||
serviced-by-hub
| Not replicating to dsa-name as handled by hub in that DSA Multiwrite group | ||
recovering
| Replication to dsa-name has failed, will use MW-DISP for recovery | ||
disp-failed
| dsa-name is in the process of recovering using Multiwrite | ||
waiting-for-disp
| MW-DISP initialization in progress | ||
queue-purged
| Multiwrite queue size exceeded | ||
failed-update-sent
| Attempting to reconnect to dsa-name
| ||
pending-remote | Number | >=0 | Count of updates that have been sent to replicating peer DSA |
confirmed-local | Number | >=0 | Count of updates in queue that clients are not waiting for confirmation on. The updates have either been confirmed or are replicating asynchronously. |
Multiwrite Format Example
DSA mw1 has 2 multiwrite peer DSAs (mw2 & mw3), therefore, two messages are sent.
{ "dxserver-monitor": {"host-name": "hostname.com","dsa-name": "mw1","time": "20150223.214700.014","multiwrite" : {"dsa-name": "mw2","queue-length": 0,"status": "ok","pending-remote": 10,"confirmed-local": 10}}}{ "dxserver-monitor": {"host-name": "hostname.com","dsa-name": "mw1","time": "20150223.214700.014","multiwrite" : {"dsa-name": "mw3","queue-length": 1000,"status": "failed","pending-remote": 0,"confirmed-local": 0}}}
DSA Internal Statistics Information
The DSA keeps track of low-level internal statistical information that can be useful in tracking operational or performance problems.
Some information is reset when a DSA is restarted and keeps increasing until the values wrap back to zero when unsigned MAX_INT (32 bit) is reached. Other information is reset after each snapshot. For this reason the push-interval is not supported for this monitoring event.
- Statistics Object Name:dsastats
- Trigger:Message sent every 60 seconds
Name
| Value Type
| Format
| Description
|
associations | Number | >=0 | Number of active connections to the DSA |
nil-credit | Number | >=0 | Number of times since the last message when an operation processing got delayed as the credit limit was reached. This value is reset every 60 seconds. |
queued-ops | Number | >=0 | Number of operations the DSA is processing |
queued-remote-ops | Number | >=0 | Number of operations that have been sent to other DSAs for processing |
ops-processed | Number | >=0 | Number of operations that have been processed since the last message. This value is reset every 60 seconds. |
entries-returned | Number | >=0 | Number of entries returned by the DSA since the last message. This value is reset every 60 seconds. |
mw-queue | Number | >=0 | The total number of queued updates in all multiwrite queues |
mw-replicating | Number | >=0 | The total number of updates that are pending a response from remote peer DSAs |
active | Number | >=0 | Approximately the percentage of active threads over the last minute. This value is reset every 60 seconds. |
memory-trees | Number | >=0 | The number of internal memory blocks in use |
memory-usage | Number | >=0 | Total amount of memory DSA has requested from the operating system |
mallocs | Number | >=0 | The number of times malloc was called since the last message. The DSA reuses memory so over time the number of malloc calls should reduce. If these calls do not reduce, the DSA is not performing at full efficiency as requesting memory from the OS incurs a performance cost. This value is reset every 60 seconds. |
buffers | Number | >=0 | The number of transport buffers in the DSA. These buffers are used when sending requests and responses to other DSAs or clients. If the number of transport buffers grows this can indicate that the receiver is not keeping up with the load generated and may indicate a bottleneck |
free-buffers | Number | >=0 | The number of transport buffers that can be reused |
selects | Number | >=0 | The number of times select() was called since the last message. This object is used to listen for network events (requests or disconnects). This value is reset every 60 seconds. |
write-selects | Number | >=0 | The number of times select() returned a write event. These events occur when the DSA tries to send a request or response and the attempt is blocked by the receiver. The DSA needs to wait for the other end to become writable. If these are frequent then the target may have a performance issue. This value is reset every 60 seconds. |
thread-count | Number | >=0 | Number of threads available in the pool to work. The DSA no longer has backend threads, so these counters that appear in SNMP have been dropped. |
thread-mean | Number | >=0 | Average number of threads active from the available threads |
DSA Internal Statistics Format Example
{ "dxserver-monitor": {"host-name": "hostname.com","dsa-name": "data","time": "20150223.214700.014","dsastats" : {"associations": 2,"nil-credit": 0,"queued-ops": 1,"queued-remote-ops": 1,"ops-processed": 10,"entries-returned": 34,"mw-queue": 0,"mw-not-sent": 0,"busy": 0,"memory-trees": 25,"memory-usage": 6,"mallocs": 0,"buffers": 10,"free-buffers": 0,"selects": 3850,"write-selects": 0,"thread-count": 8,"thread-mean": 0}}}
Response Format
When the DSA sends a request through REST, a response from the server must be sent back to the DSA. The DSA does not enforce it, but can be used to indicate an issue with processing the response and aids in diagnosing server side monitoring issues. It may be more efficient for the endpoint to not send responses.
The DSA currently expects the following HTTP responses and the body of the response are ignored.
Response Messages
HTTP Status Code
| Reason
|
|---|---|
403 | Event failed the HTTP basic authorization check |
200 or 204 | Event was successfully processed |