PingAuthorize

Using the Monitor History plugin

The Monitor History plugin is a PingAuthorize Server component that periodically captures a snapshot of the server’s cn=monitor backend and writes it to a set of rotating log files.

While standard access and error logs record discrete events, the Monitor History plugin records the overall state of the server. This creates a general timeline of server health, resource usage, and internal processing, which is useful for analyzing performance issues.

Key benefits of using the plugin include:

  • Post-incident analysis: If the server crashes or is restarted to resolve an issue, real-time monitor data is typically lost. The plugin ensures that the server snapshots leading up to the crash are preserved on disk.

  • Thread analysis: The plugin captures full Java Virtual Machine (JVM) stack traces. This shows what every thread was doing at the moment of the snapshot, helping identify stuck threads without needing to manually trigger a thread dump.

  • Trend analysis: By comparing monitor snapshots generated during healthy and degraded periods, you can determine which system resources were under strain or fully exhausted.

The plugin runs on a timer specified by the log-interval property. At each interval, it reads the contents of the cn=monitor backend and writes them as LDAP entries to a timestamped log file.

Learn more about the monitor backend in the PingDirectory documentation.

Key troubleshooting data

When analyzing monitor history log files during an incident, the following entries can be especially useful:

JVM stack trace

DN: cn=JVM Stack Trace,cn=monitor

Lists every thread in the JVM stack trace, its call stack, and its execution state.

If the server becomes unresponsive, check the JVM stack trace entry corresponding to the time of the incident. Look for threads waiting on specific connections that might be timing out.

Example snapshot
"Worker Thread 4 for General Queue 0" #36 prio=10 os_prio=0 cpu=8123.44ms elapsed=22.17s tid=0x00007f9d3c013800 nid=0x52 BLOCKED  [0x00007f9d2d5f9000]
   java.lang.Thread.State: BLOCKED (on object monitor)
        at com.unboundid.directory.server.backend.ID2Entry.readEntry(ID2Entry.java:241)
        - waiting to lock <0x000000071ab23450> (a com.unboundid.directory.server.backend.ID2Entry)
        at com.unboundid.directory.server.OperationRunner.process(OperationRunner.java:183)
"Worker Thread 5 for General Queue 0" #37 prio=10 os_prio=0 cpu=8010.11ms elapsed=21.98s tid=0x00007f9d3c015000 nid=0x53 BLOCKED  [0x00007f9d2d3f8000]
   java.lang.Thread.State: BLOCKED (on object monitor)
        at com.unboundid.directory.server.backend.ID2Entry.readEntry(ID2Entry.java:241)
        - waiting to lock <0x000000071ab23450> (held by thread nid=0x52)
        at com.unboundid.directory.server.OperationRunner.process(OperationRunner.java:183)
"Worker Thread 6 for General Queue 0" #38 prio=10 os_prio=0 cpu=7998.22ms elapsed=21.70s tid=0x00007f9d3c017800 nid=0x54 RUNNABLE  [0x00007f9d2d1f7000]
   java.lang.Thread.State: RUNNABLE
        at com.unboundid.directory.server.backend.ID2Entry.readEntry(ID2Entry.java:241)
        - locked <0x000000071ab23450>
        at com.unboundid.directory.server.OperationRunner.process(OperationRunner.java:183)
  • Thread 6 is holding a backend lock.

  • Threads 4 and 5 are stuck in BLOCKED state waiting for Thread 6 to release the lock, preventing them from processing new operations.

    This can lead to slow responses, operation backlogs, or increasing work queue size.

Work queue

DN: cn=Work Queue,cn=monitor

Tracks operations waiting to be processed.

If current-queue-size is high or rejected-count is increasing, the server is receiving more traffic than it can process, or worker threads are blocked. High queue size combined with growing response times usually indicates either insufficient worker threads or a bottleneck in downstream services.

Example snapshot
dn: cn=Work Queue,cn=monitor
current-queue-size: 12
rejected-count: 5
stolen-count: 0
num-worker-threads: 16
num-busy-worker-threads: 14
current-worker-thread-percent-busy: 87
max-worker-thread-percent-busy: 100
recent-operation-queue-time-millis: 420
  • current-queue-size: 12

    Twelve operations are waiting in the work queue. A growing queue size indicates that the server is falling behind on processing incoming tasks.

  • num-busy-worker-threads: 14 / current-worker-thread-percent-busy: 87

    Most worker threads are actively processing tasks. High utilization indicates that the server is operating under significant load.

  • max-worker-thread-percent-busy: 100

    At peak load, all worker threads were busy. This indicates that the server has reached its processing capacity.

  • rejected-count: 5

    Five operations were rejected because no worker threads were available. Rejections occur when the server is temporarily overloaded.

  • recent-operation-queue-time-millis: 420

    Operations waited an average of 420 ms in the queue before processing. Increasing queue times indicate slower throughput.

  • stolen-count: 0

    No work stealing has occurred. This might indicate uneven queue distribution, which can increase latency.

Gauges

Entries containing objectClass: ds-gauge-monitor-entry

Tracks resource utilization.

Check cn=Gauge JVM Memory Usage to determine whether memory consumption spiked before a crash (indicating a memory leak) or cn=Gauge CPU Usage to correlate high system load with specific traffic patterns.

Example snapshot
dn: cn=Gauge HTTP Processing (Percent) for HTTPS Connection Handler,cn=monitor
gauge-name: HTTP Processing (Percent)
resource: HTTPS Connection Handler
value: 95
value-minimum: 45
value-maximum: 100
severity: major
samples-this-interval: 5
update-time: 20251125231500.020Z
  • value: 95

    The HTTP connection handler is processing requests at 95% of its capacity.

  • value-minimum / value-maximum: 45 / 100

    Over the snapshot interval, the system load varied widely and occasionally reached maximum capacity.

  • severity: major

    The server is under stress, indicating a potential performance bottleneck.

  • samples-this-interval: 5

    Multiple samples confirm that high system load is sustained.

Client connections

DN: cn=Client Connections,cn=monitor

A sudden spike in established-connection-count might indicate a sudden surge in client connection requests or a load balancer misconfiguration.

Example snapshot
dn: cn=Client Connections,cn=monitor
cn: Client Connections
established-connection-count: 102
max-concurrent-connection-count: 105
total-connection-count-since-startup: 5000
current-connection-counts-by-bind-dn: cn=Directory Manager,cn=Root DNs,cn=config: 50
current-connection-counts-by-ip-address: 192.168.1.100: 80
current-connection-counts-by-client-connection-policy: ds-cfg-policy-id=default,cn=Client Connection Policies,cn=config: 102
  • established-connection-count: 102

    A high number of active connections indicates an unusually heavy load.

  • max-concurrent-connection-count: 105

    Shows the peak number of concurrent connections. If this value is close to the configured limit, the server might be at risk of throttling or refusing new connections.

  • current-connection-counts-by-bind-dn: cn=Directory Manager,cn=Root DNs,cn=config: 50

    Indicates that a single bind DN or IP is consuming a large portion of connection. This can lead to uneven load or denial of service for other clients.

Configuring the plugin

You can use the PingAuthorize admin console or dsconfig to configure the plugin.

  • Admin console

  • dsconfig

Steps

  1. In the PingAuthorize admin console, go to Configuration > LDAP (Administration and Monitoring) > Plugin Root.

  2. In the Plugins section, click Monitor History.

  3. In the Log Interval field, enter the interval, in seconds, between monitor snapshots.

  4. In the Log File field, enter a filepath for the logs generated by the plugin.

    The path can be either relative to the server root or an absolute path.

    By default, the plugin writes to logs/monitor-history/monitor.

  5. In the Log File Permissions field, enter a UNIX mode string specifying the UNIX permissions of the generated log file.

  6. (Optional) In the Logging Error Behavior list, select one of the following error behaviors:

    • Standard Error: Writes a message to standard error.

    • Lockdown Mode: Places the server in lockdown mode.

  7. In the Retention Policy section, add a new log file retention policy or edit the existing policies.

    The Free Disk Space Retention and Monitor History Size Limit Retention policies are enabled by default. These policies ensure the plugin doesn’t consume excessive disk space over time.

  8. (Optional) To remove personally identifiable information (PII) from the monitor history log file, select the Enabled checkbox under Sanitize.

  9. Click Save.

Steps

  • To configure the Monitor History plugin, use the dsconfig set-plugin-prop command with the following arguments:

    Argument Required Description

    --plugin-name "<name>"

    Required

    Set to "Monitor History".

    --set description:"<description>"

    Optional

    Specifies a description for the plugin.

    --set enabled:true

    Required

    Determines whether the plugin is enabled.

    Allowed values are true and false. The plugin is disabled by default.

    --set log-interval:"<interval> s"

    Required

    Specifies the interval, in seconds, between monitor snapshots.

    The default value is 5 minutes.

    --set log-file:<file-path>

    Required

    Specifies the filepath for logs generated by the Monitor History plugin. The path can be either relative to the server root or an absolute path.

    You must disable and re-enable the plugin or restart the server for this setting to take effect.

    --set log-file-permissions:<unix-permission-string>

    Required

    Specifies the UNIX permissions to apply to the generated log files.

    --set logging-error-behavior:<error-behavior>

    Optional

    Specifies the server behavior when an error occurs during log processing. Allowed values are:

    • standard-error: Writes a message to standard error.

    • lockdown-mode: Places the server in lockdown mode.

    --add retention-policy:"<retention-policy>"

    Required

    Specifies the log retention policy for the plugin. When multiple policies are used, log files are cleared when any of the policies' conditions are met.

    --set retain-files-sparsely-by-age:true

    Optional

    Retains a smaller number of older files so you can still see long-term trends, even when newer files trigger retention limits.

    --set sanitize:true

    Optional

    Redacts personally identifiable information from the monitor files.

    This setting should only be enabled when necessary, since it reduces the information available in the archive and can increase the time to find the source of performance issues.

Deployment best practices

When deploying the Monitor History plugin in production environments, it’s important to ensure the plugin’s logs are preserved reliably and to minimize performance overhead.

Containerized environments

In containerized environments such as Kubernetes, the local filesystem of a pod or container is ephemeral. For example, if a pod restarts or is rescheduled, any monitor history logs stored locally are lost. Losing these logs during a critical event prevents the plugin from providing the diagnostic information it’s intended to capture.

To prevent this issue, configure the log-file path to write to a mounted persistent volume (PV). PVs persist data beyond the lifetime of any individual pod in a cluster.

Learn more in Persistent Volumes in the Kubernetes documentation.

Incident reporting

When creating a support ticket for a performance issue or outage, Ping Identity Support typically requests a collect support data bundle. However, during severe incidents, the server might be too unresponsive to run the collect-support-data tool.

If you can’t run collect-support-data during an incident, provide the Monitor History logs covering the time of the incident to your support provider. The detailed view of server state and thread activity in these logs offers insight into what was happening at the time of the incident.