Explanations and examples of the system alarms, alerts, and gauges in the PingData servers.
An alarm represents a stateful condition of the server or a resource that can indicate a problem, such as low disk space or external server unavailability. A gauge defines a set of threshold values with a specified severity that, when crossed, cause the server to enter or exit an alarm state. Gauges are used for monitoring continuous values like CPU load or free disk space (Numeric Gauge), or an enumerated set of values such as 'server available' or ‘server unavailable’ (Indicator Gauge). Gauges generate alarms, when the gauge’s severity changes because of changes in the monitored value. Like alerts, alarms have severity (NORMAL, WARNING, MINOR, MAJOR, CRITICAL), name, and message. Alarms will always have a Condition property, and can have a Specific Problem or Resource property. If surfaced through SNMP, a Probable Cause property and Alarm Type property are also listed. Alarms can be configured to generate alerts when the alarm's severity changes.
There are two alert types supported by the server - standard and alarm-specific. The
server constantly monitors for conditions that might need attention by administrators,
such as low disk space. For this condition, the standard alert is
low-disk-space-warning
, and the alarm-specific alert is
alarm-warning
. The server can be configured to generate
alarm-specific alerts instead of, or in addition to, standard alerts. By default,
standard alerts are generated for conditions internally monitored by the server.
However, gauges can only generate alarm alerts.
The server installs a set of gauges that are specific to the product and that can be
cloned or configured through the dsconfig
tool. Existing gauges can be
tailored to fit each environment by adjusting the update interval and threshold values.
Configuration of system gauges determines the criteria by which alarms are triggered.
The Stats Logger can be used to view historical information about the value and severity
of all system gauges.
PingData servers are compliant with the International Telecommunication Union CCITT
Recommendation X.733 (1992) standard for generating and clearing alarms. If configured,
entering or exiting an alarm state can result in one or more alerts. An alarm state is
exited when the condition no longer applies. An alarm_cleared
alert
type is generated by the system when an alarm's severity changes from a non-normal
severity to any other severity. An alarm_cleared
alert will correlate
to a previous alarm when Condition and Resource property are the same. The Alarm
Manager, which governs the actions performed when an alarm state is entered, is
configurable through the dsconfig
tool and administrative console.
Like the Alerts Backend, which stores information in cn=alerts
, the
Alarm Backend stores information within the cn=alarms
backend. Unlike
alerts, alarm thresholds have a state over time that can change in severity and be
cleared when a monitored value returns to normal. Alarms can be viewed with the
status
tool. As with other alert types, alert handlers can be
configured to manage the alerts generated by alarms. A complete listing of system
alerts, alarms, and their severity is available in
<server-root>/docs/admin-alerts-list.csv.