Testing alerts and alarms
After alarms and alert handlers are configured, you can manually increase the severity of a gauge to verify that the server takes the appropriate action when an alarm state change.
You can then use the status
tool to verify alarms and alerts.
Testing alarms and alerts
Steps
-
Use
dsconfig
to configure a gauge and set theoverride-severity
property to critical.The following example configures the CPU Usage (Percent) gauge.
Example:
$ dsconfig set-gauge-prop \ --gauge-name "CPU Usage (Percent)" \ --set override-severity:critical
-
Run the
status
tool to verify that an alarm was generated with corresponding alerts.The
status
tool provides a summary of the server’s current state with key metrics and a list of recent alerts and alarms.Example:
The sample output has been shortened to show just the alarms and alerts information.
$ bin/status --- Administrative Alerts --- Severity : Time : Message ---------:-----------------:------------------------------------------------------ Info : 11/Aug/2014 : A configuration change has been made in the : 15:48:46 -0500 : Directory Server: : : [11/Aug/2014:15:48:46.054 -0500] : : conn=17 op=73 dn='cn=Directory Manager,cn=Root : : DNs,cn=config' authtype=[Simple] from=127.0.0.1 : : to=127.0.0.1 command='dsconfig set-gauge-prop : : --gauge-name 'Cleaner Backlog (Number Of Files)' : : --set warning-value:-1' Info : 11/Aug/2014 : A configuration change has been made in the : 15:47:32 -0500 : Directory Server: [11/Aug/2014:15:47:32.547 -0500] : : conn=4 op=196 dn='cn=Directory Manager,cn=Root : : DNs,cn=config' authtype=[Simple] from=127.0.0.1 : : to=127.0.0.1 command='dsconfig set-gauge-prop : : --gauge-name 'Cleaner Backlog (Number Of Files)' : : --set warning-value:0' Error : 11/Aug/2014 : Alarm [CPU Usage (Percent). Gauge CPU Usage (Percent) : 15:41:00 -0500 : for Host System has : : a current value of '18.583333333333332'. : : The severity is currently OVERRIDDEN in the : : Gauge's configuration to 'CRITICAL'. : : The actual severity is: The severity is : : currently 'NORMAL', having assumed this severity : : Mon Aug 11 15:41:00 CDT 2014. If CPU use is high, : : check the server's current workload and make any : : needed adjustments. Reducing the load on the system : : will lead to better response times. : : Resource='Host System'] : : raised with critical severity Shown are alerts of severity [Info,Warning,Error,Fatal] from the past 48 hours Use the --maxAlerts and/or --alertSeverity options to filter this list
--- Alarms --- Severity : Severity Start : Condition : Resource : Details : Time : : : ---------:----------------:-----------:-------------:------------------------------ Critical : 11/Aug/2014 : CPU Usage : Host System : Gauge CPU Usage (Percent) for : 15:41:00 -0500 : (Percent) : : Host System : : : : has a current value of : : : : '18.785714285714285'. : : : : The severity is currently : : : : 'CRITICAL', having assumed : : : : this severity Mon Aug 11 : : : : 15:49:00 CDT 2014. If CPU use : : : : is high, check the server's : : : : current workload and make any : : : : needed adjustments. Reducing : : : : the load on the system will : : : : lead to better response times Warning : 11/Aug/2014 : Work Queue: Work Queue : Gauge Work Queue Size (Number : 15:39:40 -0500 : Size : : of Requests) for Work Queue : : (Number of: : has a current value of '27'. : : Requests) : : The severity is currently : : : : 'WARNING' having assumed this : : : : severity Mon Aug 11 15:48:50 : : : : CDT 2014. If all worker : : : : threads are busy processing : : : : other client requests, then : : : : new requests that arrive will : : : : be forced to wait in the work : : : : queue until a worker thread : : : : becomes available Shown are alarms of severity [Warning,Minor,Major,Critical] Use the --alarmSeverity option to filter this list
Indeterminate alarms
The server raises indeterminate alarms for a server condition for which a severity cannot be determined.
In most cases these alarms are benign and do not issue alerts, nor do they appear in the output of the status
tool or Administrative Console by default.
These alarms are usually caused by an enabled gauge that is intended to measure an aspect of the server that is not currently enabled. For example, gauges intended to monitor metrics related to replication might produce indeterminate alarms if a server is not currently replicating data. The gauge can be disabled if needed.
For more information about indeterminate alarms, view the gauge’s associated monitor entry. There might be messages that can help determine the issue.
The following is sample output from the status
tool run with the —alarmSeverity=indeterminate
option.
--- Alarms --- Severity : Severity Start : Condition : Resource : Details : Time : : : -------------:----------------:----------------:------------:------------------------ Normal : 26/Aug/2014 : Startup Begun : cn=config : The Directory Server : 14:16:29 -0500 : : : is starting. : : : : Indeterminate: 26/Aug/2014 : Replication : not : The value of gauge : 14:16:40 -0500 : Latency : available : Replication Latency : : (Milliseconds) : : (Milliseconds) could not : : : : be determined. The : : : : severity is INDETERMINATE, : : : : having assumed this : : : : severity Tue Aug 26 : : : : 14:17:10 CDT 2014.
The following is an indeterminate alarm for the Replication Latency (Milliseconds) gauge. A search of the monitor backend for this gauge’s entry results in an error message that might explain the indeterminate severity.
# ldapsearch -w password --baseDN "cn=monitor" \ -D"cn=directory manager" gauge-name="Replication Latency (Milliseconds)" dn: cn=Gauge Replication Latency (Milliseconds),cn=monitor objectClass: top objectClass: ds-monitor-entry objectClass: ds-numeric-gauge-monitor-entry objectClass: ds-gauge-monitor-entry objectClass: extensibleObject cn: Gauge Replication Latency (Milliseconds) gauge-name: Replication Latency (Milliseconds) resource: severity: indeterminate summary: The value of gauge Replication Latency (Milliseconds) could not be determined. The severity is INDETERMINATE, having assumed this severity Tue Aug 26 15:42:40 CDT 2014 error-message: No entries were found under cn=monitor having object class ds-replica-monitor-entry ...