PingDirectory

Testing alerts and alarms

About this task

After alarms and alert handlers are configured, verify that the server takes the appropriate action when an alarm state changes by manually increasing the severity of a gauge. Alarms and alerts can be verified with the status tool.

Steps

  1. Configure a gauge with dsconfig and set the override-severity property to critical. The following example uses the CPU Usage (Percent) gauge.

    $ dsconfig set-gauge-prop \
      --gauge-name "CPU Usage (Percent)" \
      --set override-severity:critical
  2. Run the status tool to verify that an alarm was generated with corresponding alerts. The status tool provides a summary of the server’s current state with key metrics and a list of recent alerts and alarms. The sample output has been shortened to show just the alarms and alerts information.

    $ bin/status
                            --- Administrative Alerts ---
    Severity : Time           : Message
    ---------:----------------:-----------------------------------------------
    Error    : 11/Aug/2016    : Alarm [CPU Usage (Percent). Gauge CPU Usage
    (Percent)
             : 15:41:00 -0500 : for Host System has
             :                : a current value of '18.583333333333332'.
             :                : The severity is currently OVERRIDDEN in the
             :                : Gauge's configuration to 'CRITICAL'.
             :                : The actual severity is: The severity is
             :                : currently 'NORMAL', having assumed this
    severity
             :                : Mon Aug 11 15:41:00 CDT 2016. If CPU use is
    high,
             :                : check the server's current workload and make
    any
             :                : needed adjustments. Reducing the load on the
    system
             :                : will lead to better response times.
             :                : Resource='Host System']
             :                : raised with critical severity
    Shown are alerts of severity [Info,Warning,Error,Fatal] from the past 48
    hours
    Use the --maxAlerts and/or --alertSeverity options to filter this list
    --- Alarms ---
    Severity : Severity   : Condition : Resource    : Details
             : Start Time :           :             :
    ---------:------------:-----------:-------------:-------------------------
    Critical : 11/Aug/2016: CPU Usage : Host System : Gauge CPU Usage
    (Percent) for
             : 15:41:00   : (Percent) : : Host System
             :  -0500     :           :             : has a current value of
             :            :           :             : '18.785714285714285'.
             :            :           :             : The severity is
    currently
             :            :           :             : 'CRITICAL', having
    assumed
             :            :           :             : this severity Mon Aug 11
             :            :           :             : 15:49:00 CDT 2016. If
    CPU use
             :            :           :             : is high, check the
    server's
             :            :           :             : current workload and
    make any
             :            :           :             : needed adjustments.
    Reducing
             :            :           :             : the load on the system
    will
             :            :           :             : lead to better response
    times
    Shown are alarms of severity [Warning,Minor,Major,Critical
    Use the --alarmSeverity option to filter this list