PingDirectory server gauges

The PingDirectory server provides a number of built-in gauges to monitor server performance. These gauges are listed in the following table:

Gauge Namee Enabled by default? Description

Gauge Namee	Enabled by default?	Description
Active Cleaner Threads (Percent)	true	Monitors the percentage of database cleaner threads that are active in a Berkeley DB environment. The resource identifier for this gauge is a backend ID. Use the `list-backends` command to see a list of backend IDs. A separate gauge monitor entry will be created for each monitored backend. Backends can be included or excluded from monitoring by specifying its backend ID in the include resource and exclude resource properties respectively. To keep the database from growing on disk, database cleaner threads copy database information from older, mostly obsolete database files to new database files. At a single point in time, 100% of the cleaner threads might be active, but when averaged over time, the percentage of active cleaner threads should remain relatively low. Even environments that sustain a very high write load do not typically see an average cleaner percent busy over 50%. If the percentage exceeds 90% for over an hour, it is a sign that the database cleaner is not progressing and the act of cleaning is producing as much garbage as it cleans. This typically occurs in environments that are not fully cached, and have not given sufficient memory to the database cache. Either increase the memory available to the database cache (by increasing the JVM size or increasing the `db-cache-percent` setting on the backend) or reduce the `db-cleaner-min-utilization` setting, which reduces the burden on the cleaner thread(s).
Authentication Failure Rate	false	Rate of LDAP bind failures per second. A high rate of failures might indicate a misconfigured client application or a malicious attack.
Available File Descriptors	true	Monitors the number of file descriptors available to the server process. The server allows for an unlimited number of connections by default, but is restricted by the file descriptor limit on the operating system. The number of file descriptors that the server will use can be configured by either using a `NUM_FILE_DESCRIPTORS` environment variable, or by creating a `config/num-file-descriptors` file with a single line such as `NUM_FILE_DESCRIPTORS=12345`. If these are not set, the default of 65535 is used. Running out of available file descriptors can lead to unpredictable behavior and severe system instability.
Certificate Expiration (Days)	true	Monitors the expiration dates of key server certificates. A server certificate expiring can cause server unavailability, degradation, or loss of key server functionality. Certificates nearing the end of their validity should be replaced as soon as possible. See the status tool, or Status in the Administrative Console, for more information about server certificates and how they are managed.
Changelog Database Target Size (Percent)	true	Monitors the size of a changelog database on disk relative to the configured `target-database-size` value for the Replication Server changelog and the LDAP Changelog Backend (cn=changelog). The resource identifier indicates the changelog environment that is being monitored. The server aims to keep the disk usage of the changelog to between 95% and 100% of the target-database-size value. The most common reason that the server exceeds this limit is that there are no more changes that are old enough to purge. This is controlled with the replication-purge-delay setting on the Replication Server configuration object and the `changelog-maximum-age` setting on the 'changelog' Backend configuration object. If this is the case, the effective-purge-delay monitor attribute will match the configured purge delay. To eliminate the alarm in this case reduce the purge delay or increase the target-database-size value. Another reason that the disk usage could exceed this limit is that an export-ldif or online backup of the corresponding backend is running since database files on disk cannot be deleted while this operation is in progress. In practice, this will not occur unless the purge delay and target-database-size setting are configured to very small values and/or the backup is throttled with the `--maxMegabytesPerSecond` option so that it takes an especially long time to complete. The final reason that the limit could be exceeded is that the target-database-size setting is unreasonably small like less than one gigabyte.
Cleaner Backlog (Number Of Files)	true	Monitors the cleaner backlog in a Berkeley DB environment. The resource identifier for this gauge is a backend ID. Use the `list-backends` command to see a list of backend IDs. A separate gauge monitor entry will be created for each monitored backend. Backends can be included or excluded from monitoring by specifying its backend ID in the include resource and exclude resource properties respectively. The backlog is the number of database files that need to be cleaned to reach the target level of utilization. The value over time should stay close to zero. If it remains high or continues to grow, then consider updating the backend configuration to increase the `db-num-cleaner-threads` setting or reduce the `db-cleaner-min-utilization` setting. Temporary spikes in cleaner backlog are common during rebuild-index, export-ldif, and backup operations. The backlog should decrease automatically once these commands complete.
CPU Usage (Percent)	true	Monitors server CPU use and provides an averaged percentage for the interval defined. The monitored resource is the host system’s CPU, which does not include a resource identifier. If CPU use is high, check the server’s current workload and other processes on this system and make any needed adjustments. Reducing the load on the system will lead to better response times.
Database Cache Full (Percent)	true	Monitors the percentage capacity of the database cache currently populated with entries, per backend. The resource identifier for this gauge is a backend ID. Use the `list-backends` command to see a list of backend IDs. A separate gauge monitor entry will be created for each monitored backend. Backends can be included or excluded from monitoring by specifying its backend ID in the include-resource and exclude-resource properties respectively. Server performance can drop off dramatically when the database cache can no longer hold the entire data set. Most deployments are designed to keep the database cache usage comfortably within the configured limits. If this server is intentionally disk-bound, this gauge should be disabled.
Disk Busy (Percent)	true	Monitors disk busy percentage over the update interval. This gauge requires that the Host System Monitor Provider be enabled and that any monitored disks be registered using the disk-devices property of that Monitor Provider. The resource identifier for this gauge is the disk device name. Use the `iostat` command or a similar system utility to see a list of disk device names. A separate gauge monitor entry will be created for each monitored disk. High disk usage might be indicative of a directory server whose database cache and/or JVM heap size are not large enough to contain the entire data set.
HTTP Processing (Percent)	true	Monitors the percentage of time that request handler threads spend processing HTTP requests. This percentage represents the inverse of the server’s ability to handle new requests without queueing.
JVM Memory Usage (Percent)	true	Monitors the percentage of Java Virtual Machine memory that is in use. This value naturally fluctuates due to garbage collection, so the minimum value within an interval is reported since it is a better indication of overall memory growth. When the memory usage exceeds 90%, this should be reported to customer support since the server is either misconfigured or has a memory leak. As memory usage approaches 100%, the server is more and more likely to experience garbage collection pauses, which leave the server unresponsive for a long time. Restarting the server is likely the only remedy for this situation. Prior to restarting the server, please run collect-support-data and capture the output of 'jmap -histo ' to provide to customer support. The pid of the server can be found from `/logs/server.pid`.
LDAP Operation Average Response Time (Milliseconds)	false	Monitors the average response time across all LDAP operations processed by this server since it was started. There is no resource identifier associated with this gauge. The monitored resource is overall response time of all LDAP operations processed by this server since it was started. High response times can be indicative of a number of factors including a disk-bound server, network latency, or misconfiguration. Enabling the Stats Logger plugin might help isolate problems. See the administration guide for more information on common problems and solutions.
LDAP Operations Failed (Percent)	false	Monitors the percentage of all LDAP operations processed by this server that have failed since it was started. There is no resource identifier associated with this gauge. The monitored resource is overall number of failed LDAP operations processed by this server since it was started. A high percentage of failed operations might indicate misconfiguration of a client or server in a topology.
License Expiration (Days)	true	Monitors the expiration date of the product license. An expired license will cause warnings to appear in the server’s logs and in the status tool output. Request a license key through the Ping Identity licensing website https://www.pingidentity.com/en/account/request-license-key.html or contact sales@pingidentity.com. Use the dsconfig tool to update the License configuration’s license key property.
Memory Usage (Percent)	false	Monitors the percentage of memory use averaged over the update interval defined. The monitored resource is the host system’s memory use, which does not have a resource identifier. Some operating systems, including Linux, use the majority of memory for file system cache, which is freed as applications need it. If memory use is high, check the applications that are running on the server.
Purge Expired Data Backlog (Number of Entries)	true	Monitors the backlog of entries that need to be purged by a Purge Expired Data Plugin. The resource identifier for this gauge is the name of the configured plugin. Increasing the max-updates-per-second configuration property on the plugin can increase the rate that the plugin purges expired data. It might also be necessary to refine the search that the plugin performs to be more efficient.
Recent Changes Database-to-JVM Heap Size Ratio (Percent)	true	The recent changes database keeps a number of recent changes made by update operations in changelog change entry form. This database is used by replication and the changelog backends, and unexpected growth affects server start time, memory consumption and space on disk. The ratio of the recent changes database to the overall heap size is used to track the impact on memory consumption. If you expect large changes and the server isn’t experiencing issues, increase the alarm threshold. After processing large changes the alarm should clear itself, if not, it might necessary to perform an export and re-import of the data to resolve the issue.
Replication Conflict Growth Rate	false	Growth rate of the number of unresolved conflicts per second over the update interval. The resource identifier for this gauge is the base DN of the replica. Use the `dsreplication status` command to see a list of replicated base DNs. A separate gauge monitor entry will be created for each monitored replica. Replicas can be included or excluded from monitoring by specifying a replication base DN in the include resource and exclude resource properties respectively. Updates to directory server entries in a replication topology can happen independently, since replication guarantees only eventual consistency, not strong consistency. The eventual consistency model means that conflicting changes can be applied at different directory server instances. In most cases, the directory server is able to resolve these conflicts automatically and in a consistent manner. However, in some scenarios, manual administrative action is required. Attention should be paid to the origin of client write requests to prevent conflicts.
Replication Connection Status	false	Indicates the connection status of remote replication servers this servers replication topology. The 'cn=schema' backend as well as the local replication server, are excluded using the include-filter property. For all other remote servers in the topology, separate monitor entries will be created per server and replication base DN. The resource identifier for this gauge is a concatenated string containing the data set name, the host and port of the replication server and the replication server ID. A replicated data set depends upon the availability of servers replicating the data. So long as there are other replicas available in a replication topology, a single replica being unavailable should not affect the overall performance of the replication topology. However, unavailable replicas can increase the likelihood of data loss or performance degradation.
Replication Latency (Milliseconds)	false	Average amount of time it takes a modification on one replica to propagate and commit on another replica, in milliseconds. The resource identifier for this gauge is the base DN of the replica. Use the `dsreplication status` command to see a list of replicated base DNs. A separate gauge monitor entry will be created for each monitored replica. Replicas can be included or excluded from monitoring by specifying its base DN in the include-resource and exclude-resource properties respectively. Replication latency can be reported as high after the server starts, if the global configuration property startup-min-replication-backlog-count is not set. That property limits the number of outstanding changes any replica can have before the server will complete the startup process and begin accepting connections. This limits how out of the sync the server is at start up.
Replication Purge Delay (Hours)	true	For the replication server indicates the effective purge delay. In order to protect against missing changes the effective purge delay should be large enough to accommodate servers that have been offline as well as the need to restore from backups.
Replication Servers Available	false	Strong Encryption Not Available
Strong Encryption Not Available	true	The JVM does not appear to support strong encryption algorithms, like 256-bit AES. The server will fall back to using weaker algorithms, like 128-bit AES. To enable support for strong encryption, update your JVM to a newer version that supports it by default, or install or enable the unlimited encryption strength jurisdiction policy files in your Java installation.
Undeletable Database Files (Percent)	true	Monitors the percentage of undeletable database files in a Berkeley DB environment. The resource identifier for this gauge is a backend ID. Use the `list-backends` command to see a list of backend IDs. A separate gauge monitor entry will be created for each monitored backend. Backends can be included or excluded from monitoring by specifying its backend ID in the include resource and exclude resource properties respectively. The percentage of undeletable database files tracks the percentage of database files that have been cleaned but cannot be deleted because they are being used by a database maintenance operation. This could be a separate, offline process such as export-ldif or verify-index, or it could be a task running within the server such as a backup or replication initialization. A small percentage of undeletable files is expected when these commands are running, but a high percentage could indicate that one of these operations is having problems and should be canceled and restarted to avoid the database growing too large on disk.
Work Queue Size (Number Of Requests)	true	Number of requests in the server’s work queue waiting to be processed, averaged over the update interval. There is no resource identifier associated with this gauge. The monitored resource is the server’s work queue. If all worker threads are busy processing other client requests, then new requests that arrive will be forced to wait in the work queue until a worker thread becomes available.

Active Cleaner Threads (Percent)

true

Monitors the percentage of database cleaner threads that are active in a Berkeley DB environment. The resource identifier for this gauge is a backend ID. Use the list-backends command to see a list of backend IDs. A separate gauge monitor entry will be created for each monitored backend. Backends can be included or excluded from monitoring by specifying its backend ID in the include resource and exclude resource properties respectively. To keep the database from growing on disk, database cleaner threads copy database information from older, mostly obsolete database files to new database files. At a single point in time, 100% of the cleaner threads might be active, but when averaged over time, the percentage of active cleaner threads should remain relatively low. Even environments that sustain a very high write load do not typically see an average cleaner percent busy over 50%. If the percentage exceeds 90% for over an hour, it is a sign that the database cleaner is not progressing and the act of cleaning is producing as much garbage as it cleans. This typically occurs in environments that are not fully cached, and have not given sufficient memory to the database cache. Either increase the memory available to the database cache (by increasing the JVM size or increasing the db-cache-percent setting on the backend) or reduce the db-cleaner-min-utilization setting, which reduces the burden on the cleaner thread(s).

Authentication Failure Rate

false

Rate of LDAP bind failures per second. A high rate of failures might indicate a misconfigured client application or a malicious attack.

Available File Descriptors

true

Monitors the number of file descriptors available to the server process. The server allows for an unlimited number of connections by default, but is restricted by the file descriptor limit on the operating system. The number of file descriptors that the server will use can be configured by either using a NUM_FILE_DESCRIPTORS environment variable, or by creating a config/num-file-descriptors file with a single line such as NUM_FILE_DESCRIPTORS=12345. If these are not set, the default of 65535 is used. Running out of available file descriptors can lead to unpredictable behavior and severe system instability.

Certificate Expiration (Days)

true

Monitors the expiration dates of key server certificates. A server certificate expiring can cause server unavailability, degradation, or loss of key server functionality. Certificates nearing the end of their validity should be replaced as soon as possible. See the status tool, or Status in the Administrative Console, for more information about server certificates and how they are managed.

Changelog Database Target Size (Percent)

true

Monitors the size of a changelog database on disk relative to the configured target-database-size value for the Replication Server changelog and the LDAP Changelog Backend (cn=changelog). The resource identifier indicates the changelog environment that is being monitored. The server aims to keep the disk usage of the changelog to between 95% and 100% of the target-database-size value. The most common reason that the server exceeds this limit is that there are no more changes that are old enough to purge. This is controlled with the replication-purge-delay setting on the Replication Server configuration object and the changelog-maximum-age setting on the 'changelog' Backend configuration object. If this is the case, the effective-purge-delay monitor attribute will match the configured purge delay. To eliminate the alarm in this case reduce the purge delay or increase the target-database-size value. Another reason that the disk usage could exceed this limit is that an export-ldif or online backup of the corresponding backend is running since database files on disk cannot be deleted while this operation is in progress. In practice, this will not occur unless the purge delay and target-database-size setting are configured to very small values and/or the backup is throttled with the --maxMegabytesPerSecond option so that it takes an especially long time to complete. The final reason that the limit could be exceeded is that the target-database-size setting is unreasonably small like less than one gigabyte.

Cleaner Backlog (Number Of Files)

true

Monitors the cleaner backlog in a Berkeley DB environment. The resource identifier for this gauge is a backend ID. Use the list-backends command to see a list of backend IDs. A separate gauge monitor entry will be created for each monitored backend. Backends can be included or excluded from monitoring by specifying its backend ID in the include resource and exclude resource properties respectively. The backlog is the number of database files that need to be cleaned to reach the target level of utilization. The value over time should stay close to zero. If it remains high or continues to grow, then consider updating the backend configuration to increase the db-num-cleaner-threads setting or reduce the db-cleaner-min-utilization setting. Temporary spikes in cleaner backlog are common during rebuild-index, export-ldif, and backup operations. The backlog should decrease automatically once these commands complete.

CPU Usage (Percent)

true

Monitors server CPU use and provides an averaged percentage for the interval defined. The monitored resource is the host system’s CPU, which does not include a resource identifier. If CPU use is high, check the server’s current workload and other processes on this system and make any needed adjustments. Reducing the load on the system will lead to better response times.

Database Cache Full (Percent)

true

Monitors the percentage capacity of the database cache currently populated with entries, per backend. The resource identifier for this gauge is a backend ID. Use the list-backends command to see a list of backend IDs. A separate gauge monitor entry will be created for each monitored backend. Backends can be included or excluded from monitoring by specifying its backend ID in the include-resource and exclude-resource properties respectively. Server performance can drop off dramatically when the database cache can no longer hold the entire data set. Most deployments are designed to keep the database cache usage comfortably within the configured limits. If this server is intentionally disk-bound, this gauge should be disabled.

Disk Busy (Percent)

true

Monitors disk busy percentage over the update interval. This gauge requires that the Host System Monitor Provider be enabled and that any monitored disks be registered using the disk-devices property of that Monitor Provider. The resource identifier for this gauge is the disk device name. Use the iostat command or a similar system utility to see a list of disk device names. A separate gauge monitor entry will be created for each monitored disk. High disk usage might be indicative of a directory server whose database cache and/or JVM heap size are not large enough to contain the entire data set.

HTTP Processing (Percent)

true

Monitors the percentage of time that request handler threads spend processing HTTP requests. This percentage represents the inverse of the server’s ability to handle new requests without queueing.

JVM Memory Usage (Percent)

true

Monitors the percentage of Java Virtual Machine memory that is in use. This value naturally fluctuates due to garbage collection, so the minimum value within an interval is reported since it is a better indication of overall memory growth. When the memory usage exceeds 90%, this should be reported to customer support since the server is either misconfigured or has a memory leak. As memory usage approaches 100%, the server is more and more likely to experience garbage collection pauses, which leave the server unresponsive for a long time. Restarting the server is likely the only remedy for this situation. Prior to restarting the server, please run collect-support-data and capture the output of 'jmap -histo ' to provide to customer support. The pid of the server can be found from /logs/server.pid.

LDAP Operation Average Response Time (Milliseconds)

false

Monitors the average response time across all LDAP operations processed by this server since it was started. There is no resource identifier associated with this gauge. The monitored resource is overall response time of all LDAP operations processed by this server since it was started. High response times can be indicative of a number of factors including a disk-bound server, network latency, or misconfiguration. Enabling the Stats Logger plugin might help isolate problems. See the administration guide for more information on common problems and solutions.

LDAP Operations Failed (Percent)

false

Monitors the percentage of all LDAP operations processed by this server that have failed since it was started. There is no resource identifier associated with this gauge. The monitored resource is overall number of failed LDAP operations processed by this server since it was started. A high percentage of failed operations might indicate misconfiguration of a client or server in a topology.

License Expiration (Days)

true

Monitors the expiration date of the product license. An expired license will cause warnings to appear in the server’s logs and in the status tool output. Request a license key through the Ping Identity licensing website https://www.pingidentity.com/en/account/request-license-key.html or contact sales@pingidentity.com. Use the dsconfig tool to update the License configuration’s license key property.

Memory Usage (Percent)

false

Monitors the percentage of memory use averaged over the update interval defined. The monitored resource is the host system’s memory use, which does not have a resource identifier. Some operating systems, including Linux, use the majority of memory for file system cache, which is freed as applications need it. If memory use is high, check the applications that are running on the server.

Purge Expired Data Backlog (Number of Entries)

true

Monitors the backlog of entries that need to be purged by a Purge Expired Data Plugin. The resource identifier for this gauge is the name of the configured plugin. Increasing the max-updates-per-second configuration property on the plugin can increase the rate that the plugin purges expired data. It might also be necessary to refine the search that the plugin performs to be more efficient.

Recent Changes Database-to-JVM Heap Size Ratio (Percent)

true

The recent changes database keeps a number of recent changes made by update operations in changelog change entry form. This database is used by replication and the changelog backends, and unexpected growth affects server start time, memory consumption and space on disk. The ratio of the recent changes database to the overall heap size is used to track the impact on memory consumption. If you expect large changes and the server isn’t experiencing issues, increase the alarm threshold. After processing large changes the alarm should clear itself, if not, it might necessary to perform an export and re-import of the data to resolve the issue.

Replication Conflict Growth Rate

false

Growth rate of the number of unresolved conflicts per second over the update interval. The resource identifier for this gauge is the base DN of the replica. Use the dsreplication status command to see a list of replicated base DNs. A separate gauge monitor entry will be created for each monitored replica. Replicas can be included or excluded from monitoring by specifying a replication base DN in the include resource and exclude resource properties respectively. Updates to directory server entries in a replication topology can happen independently, since replication guarantees only eventual consistency, not strong consistency. The eventual consistency model means that conflicting changes can be applied at different directory server instances. In most cases, the directory server is able to resolve these conflicts automatically and in a consistent manner. However, in some scenarios, manual administrative action is required. Attention should be paid to the origin of client write requests to prevent conflicts.

Replication Connection Status

false

Indicates the connection status of remote replication servers this servers replication topology. The 'cn=schema' backend as well as the local replication server, are excluded using the include-filter property. For all other remote servers in the topology, separate monitor entries will be created per server and replication base DN. The resource identifier for this gauge is a concatenated string containing the data set name, the host and port of the replication server and the replication server ID. A replicated data set depends upon the availability of servers replicating the data. So long as there are other replicas available in a replication topology, a single replica being unavailable should not affect the overall performance of the replication topology. However, unavailable replicas can increase the likelihood of data loss or performance degradation.

Replication Latency (Milliseconds)

false

Average amount of time it takes a modification on one replica to propagate and commit on another replica, in milliseconds. The resource identifier for this gauge is the base DN of the replica. Use the dsreplication status command to see a list of replicated base DNs. A separate gauge monitor entry will be created for each monitored replica. Replicas can be included or excluded from monitoring by specifying its base DN in the include-resource and exclude-resource properties respectively. Replication latency can be reported as high after the server starts, if the global configuration property startup-min-replication-backlog-count is not set. That property limits the number of outstanding changes any replica can have before the server will complete the startup process and begin accepting connections. This limits how out of the sync the server is at start up.

Replication Purge Delay (Hours)

true

For the replication server indicates the effective purge delay. In order to protect against missing changes the effective purge delay should be large enough to accommodate servers that have been offline as well as the need to restore from backups.

Replication Servers Available

false

Strong Encryption Not Available

true

The JVM does not appear to support strong encryption algorithms, like 256-bit AES. The server will fall back to using weaker algorithms, like 128-bit AES. To enable support for strong encryption, update your JVM to a newer version that supports it by default, or install or enable the unlimited encryption strength jurisdiction policy files in your Java installation.

Undeletable Database Files (Percent)

true

Monitors the percentage of undeletable database files in a Berkeley DB environment. The resource identifier for this gauge is a backend ID. Use the list-backends command to see a list of backend IDs. A separate gauge monitor entry will be created for each monitored backend. Backends can be included or excluded from monitoring by specifying its backend ID in the include resource and exclude resource properties respectively. The percentage of undeletable database files tracks the percentage of database files that have been cleaned but cannot be deleted because they are being used by a database maintenance operation. This could be a separate, offline process such as export-ldif or verify-index, or it could be a task running within the server such as a backup or replication initialization. A small percentage of undeletable files is expected when these commands are running, but a high percentage could indicate that one of these operations is having problems and should be canceled and restarted to avoid the database growing too large on disk.

Work Queue Size (Number Of Requests)

true

Number of requests in the server’s work queue waiting to be processed, averaged over the update interval. There is no resource identifier associated with this gauge. The monitored resource is the server’s work queue. If all worker threads are busy processing other client requests, then new requests that arrive will be forced to wait in the work queue until a worker thread becomes available.