Monitoring
The following sections describe monitoring endpoints exposed by PingGateway, and the metrics available at the endpoints.
For information about how to set up and maintain monitoring, refer to Monitor services.
Vert.x Metrics
Vert.x metrics for HTTP clients, TCP clients, and servers are available by default at the Prometheus Scrape Endpoint and Common REST Monitoring Endpoint (deprecated) endpoints. Vert.x metrics provide low-level information about requests and responses, such as the number of bytes, duration, the number of concurrent requests. The available metrics are based on those described in Vert.x core tools metrics.
For more information about Vert.x and PingGateway, refer to the vertx
object in
AdminHttpApplication (admin.json
), and
Monitoring Vert.x Metrics.
Monitoring types
This section describes the data types used in monitoring:
- Counter
-
Cumulative metric for a numerical value that only increases.
- Gauge
-
Metric for a numerical value that can increase or decrease.
- Summary
-
Metric that samples observations, providing a count of observations, sum total of observed amounts, average rate of events, and moving average rates across a sliding time window.
The Prometheus view doesn’t provide time-based statistics, because rates can be calculated from the time-series data. Instead, the Prometheus view includes summary metrics whose names have the following suffixes or labels:
-
_count
: number of recorded events -
_sum
: total sum of recorded events -
{quantile="0.5"}
: 50% at or below this value -
{quantile="0.75"}
: 75% at or below this value -
{quantile="0.95"}
: 95% at or below this value -
{quantile="0.98"}
: 98% at or below this value -
{quantile="0.99"}
: 99% at or below this value -
{quantile="0.999"}
: 99.9% at or below this value
-
- Timer
-
Metric combining time-series summary statistics.
Common REST views show summaries as JSON objects. JSON summaries have the following fields:
{ "max": number, // maximum duration recorded "mean": number, // total/count, or 0 if count is 0 "min": number, // minimum duration recorded for this metric "mean_rate": number, // average rate "p50": number, // 50% at or below this value "p75": number, // 75% at or below this value "p95": number, // 95% at or below this value "p98": number, // 98% at or below this value "p99": number, // 99% at or below this value "p999": number, // 99.9% at or below this value "stddev": number, // standard deviation of recorded durations "m15_rate": number, // fifteen-minute average rate "m5_rate": number, // five-minute average rate "m1_rate": number, // one-minute average rate "duration_units": string, // time unit used in durations "rate_units": string, // event count unit and time unit used in rate "seconds_count": number, // events recorded for this metric "count": number, // events recorded for this metric (deprecated) "seconds_total": number // sum of the durations of events recorded "total": number // sum of the durations of events recorded (deprecated) }
Metrics at the Prometheus Scrape Endpoint
All products automatically expose a monitoring endpoint where Prometheus can scrape metrics in a standard Prometheus format. Learn more from the Prometheus website.
When PingGateway is set up as described in the Quick install, the Prometheus Scrape Endpoint is available at the following endpoints:
-
http://ig.example.com:8080/openig/metrics/prometheus/0.0.4
-
http://ig.example.com:8080/openig/metrics/prometheus (deprecated)
For an example that queries the Prometheus Scrape Endpoint, refer to Monitor the Prometheus Scrape Endpoint.
Route metrics at the Prometheus Scrape Endpoint
Route metrics at the Prometheus Scrape Endpoint have the following labels:
-
name
: Route name, for example,My Route
.If the router was declared with a default handler, then its metrics are published through the route named
default
. -
route
: Route identifier, for example,my-route
. -
router
: Fully qualified name of the router, for example,gateway.main-router
.
The following table summarizes the recorded metrics:
Name | Monitoring type | Description |
---|---|---|
|
Gauge |
Number of requests being processed. |
|
Counter |
Number of requests processed by the router or route since it was deployed. |
|
Counter |
Number of responses that threw an exception. |
|
Counter |
Number of responses that were not handled by PingGateway. |
|
Counter |
Number of responses by HTTP status code family. The
|
|
Summary |
A summary of response time observations. |
Router metrics at the Prometheus Scrape Endpoint
Router metrics at the Prometheus Scrape Endpoint have the following labels:
-
fully_qualified_name
: Fully qualified name of the router, for example,gateway.main-router
. -
heap
: Name of the heap in which this router is declared, for example,gateway
. -
name
: Simple name declared in router configuration, for example,main-router
.
The following table summarizes the recorded metrics:
Name | Monitoring type | Description |
---|---|---|
|
|
Number of routes deployed in the configuration. |
Cache metrics at the Prometheus Scrape Endpoint
Cache metrics at the Prometheus Scrape Endpoint have the following meters and metrics:
ig_cache_gets_total
A counter monitoring type, incremented when a cache request hits or misses an entry.
Label | Possible values |
---|---|
|
|
|
|
Example:
ig_cache_gets_total{content="session",...result="hit",...} 13.0
ig_cache_gets_total{content="session",...,result="miss"...} 1.0
ig_cache_gets_total{content="policy_decision",...,result="hit",...} 5.0
ig_cache_gets_total{content="policy_decision",...,result="miss",...} 2.0
ig_cache_loads
This meter exposes the following metrics:
ig_cache_loads_seconds
-
A timer monitoring type, measuring the time in seconds spent successfully or unsuccessfully loading entries in the cache.
Label Possible values content
session
,
policy_decision
,
user_profile
,
access_token
result
success
,failure
quantile
0.5
,0.75
,0.95
,0.98
,0.99
,0.999
Example:
ig_cache_loads_seconds{content="session",...result="success",...quantile="0.5",} 0.057710516 ig_cache_loads_seconds{content="session",...result="success",...quantile="0.75",} 0.057710516 ig_cache_loads_seconds{content="session",...result="success",...quantile="0.95",} 0.057710516 ig_cache_loads_seconds{content="session",...result="success",...quantile="0.98",} 0.057710516 ig_cache_loads_seconds{content="session",...result="success",...quantile="0.99",} 0.057710516 ig_cache_loads_seconds{content="session",...result="success",...quantile="0.999",} 0.057710516
ig_cache_loads_seconds_sum
/ig_cache_loads_seconds_total
(deprecated)-
A timer monitoring type, measuring the cumulated time in seconds spent successfully or unsuccessfully loading entries in the cache.
Label Possible values content
session
,
policy_decision
,
user_profile
,
access_token
result
success
,failure
Example:
ig_cache_loads_seconds_sum{content="session",...result="failure",...} 0.0 ig_cache_loads_seconds_sum{content="session",...result="success",...} 0.057710516 ig_cache_loads_seconds_sum{content="policy_decision",...,result="failure",...} 0.0 ig_cache_loads_seconds_sum{content="policy_decision",...,result="success",...} 0.144314803
ig_cache_loads_seconds_count
/ig_cache_loads_count
(deprecated)-
A counter monitoring type, incremented when a cache request is successfully or unsuccessfully loaded in the cache.
Label Possible values content
session
,
policy_decision
,
user_profile
,
access_token
result
success
,failure
Example:
ig_cache_loads_count{content="session",...result="failure",...} 0.0 ig_cache_loads_count{content="session",...result="success",...} 1.0 ig_cache_loads_count{content="policy_decision",...,result="failure",...} 0.0 ig_cache_loads_count{content="policy_decision",...,result="success",...} 2.0
ig_cache_evictions
This meter exposes the following metrics:
ig_cache_evictions_count
-
A counter monitoring type, incremented when an entry is evicted from the cache.
Label Possible values content
session
,
policy_decision
,
user_profile
,
access_token
cause
COLLECTED
,
EXPIRED
,
EXPLICIT
,
REPLACED
,
SIZE
Example
ig_cache_evictions_count{cause="COLLECTED",content="session",...} 0.0 ig_cache_evictions_sum{cause="EXPIRED",content="session",...} 0.0 ig_cache_evictions_count{cause="EXPIRED",content="session",...} 0.0 ig_cache_evictions_sum{cause="EXPLICIT",content="session",...} 0.0 ig_cache_evictions_count{cause="EXPLICIT",content="session",...} 0.0 ig_cache_evictions_sum{cause="REPLACED",content="session",...} 0.0 ig_cache_evictions_count{cause="REPLACED",content="session",...} 0.0 ig_cache_evictions_sum{cause="SIZE",content="session",...} 0.0 ig_cache_evictions_count{cause="SIZE",content="session",...} 0.0 ig_cache_evictions_sum{cause="COLLECTED",content="policy_decision",...} 0.0 ig_cache_evictions_count{cause="COLLECTED",content="policy_decision",...} 0.0 ig_cache_evictions_sum{cause="EXPIRED",content="policy_decision",...} 1.0 ig_cache_evictions_count{cause="EXPIRED",content="policy_decision",...} 1.0 ig_cache_evictions_sum{cause="EXPLICIT",content="policy_decision",...} 0.0 ig_cache_evictions_count{cause="EXPLICIT",content="policy_decision",...} 0.0 ig_cache_evictions_sum{cause="REPLACED",content="policy_decision",...} 0.0 ig_cache_evictions_count{cause="REPLACED",content="policy_decision",...} 0.0 ig_cache_evictions_sum{cause="SIZE",content="policy_decision",...} 0.0 ig_cache_evictions_count{cause="SIZE",content="policy_decision",...} 0.0
ig_cache_evictions_sum
/ig_cache_evictions_total
(deprecated)-
A counter monitoring type, incremented when an entry is evicted from the cache. Each evicted entry has the weight
1
, so this metric is equal toig_cache_evictions_count
.Label Possible values content
session
,
policy_decision
,
user_profile
,
access_token
cause
COLLECTED
,
EXPIRED
,
EXPLICIT
,
REPLACED
,
SIZE
Example
ig_cache_evictions_sum{cause="COLLECTED",content="session",...} 0.0 ig_cache_evictions_count{cause="COLLECTED",content="session",...} 0.0 ig_cache_evictions_sum{cause="EXPIRED",content="session",...} 0.0 ig_cache_evictions_count{cause="EXPIRED",content="session",...} 0.0 ig_cache_evictions_sum{cause="EXPLICIT",content="session",...} 0.0 ig_cache_evictions_count{cause="EXPLICIT",content="session",...} 0.0 ig_cache_evictions_sum{cause="REPLACED",content="session",...} 0.0 ig_cache_evictions_count{cause="REPLACED",content="session",...} 0.0 ig_cache_evictions_sum{cause="SIZE",content="session",...} 0.0 ig_cache_evictions_count{cause="SIZE",content="session",...} 0.0 ig_cache_evictions_sum{cause="COLLECTED",content="policy_decision",...} 0.0 ig_cache_evictions_count{cause="COLLECTED",content="policy_decision",...} 0.0 ig_cache_evictions_sum{cause="EXPIRED",content="policy_decision",...} 1.0 ig_cache_evictions_count{cause="EXPIRED",content="policy_decision",...} 1.0 ig_cache_evictions_sum{cause="EXPLICIT",content="policy_decision",...} 0.0 ig_cache_evictions_count{cause="EXPLICIT",content="policy_decision",...} 0.0 ig_cache_evictions_sum{cause="REPLACED",content="policy_decision",...} 0.0 ig_cache_evictions_count{cause="REPLACED",content="policy_decision",...} 0.0 ig_cache_evictions_sum{cause="SIZE",content="policy_decision",...} 0.0 ig_cache_evictions_count{cause="SIZE",content="policy_decision",...} 0.0
Timer metrics at the Prometheus Scrape Endpoint
Timer metrics at the Prometheus Scrape Endpoint have the following following labels:
-
decorated_object
-
heap
-
name
(decorator name) -
route
-
router
Name | Monitoring type | Description |
---|---|---|
|
|
Time to process the request and response in the decorated handler. |
|
|
Time to process the request and response in the decorated filter and its downstream filters and handler. |
|
|
Time to process the request and response in the decorated filter. |
|
|
Time to process the request and response in filters and handlers that are downstream of the decorated filter. |
WebSocket metrics at the Prometheus Scrape Endpoint
WebSocket metrics at the Prometheus Scrape Endpoint have the following labels:
-
frame_type
-
fully_qualified_name
-
heap
-
local
-
name
-
remote
-
route
-
router
The following table summarizes the recorded metrics:
Name | Monitoring type | Description |
---|---|---|
|
|
Number of client websockets currently open. |
|
|
Number of server websockets currently open. |
|
|
Number of application-side proxy errors. |
|
|
Number of application-side proxy frames received. |
|
|
Number of application-side proxy frames sent. |
|
|
Number of client-side proxy errors. |
|
|
Number of client-side proxy frames received. |
|
|
Number of client-side proxy frames sent. |
|
|
Number of active websocket proxy tunnels. |
|
|
Number of websocket proxy tunnels created. |
1 The reverseproxyhandler
in the metric name reflects the name of the ReverseProxyHandler component
in the PingGateway configuration.
The name shown in these examples derives from the route configuration shown in WebSocket traffic.
Startup metrics at the Prometheus Scrape Endpoint
Startup metrics at the Prometheus Scrape Endpoint are captured once at startup to record the time to load, build, and start PingGateway and the components in its heap.
ig_startup_seconds_count
metric
A counter monitoring type to record the number of times
the startup process has been measured for a component or instance. The
value is always 1
.
Example
ig_startup_seconds_count{id="ig.logging",kind="setup",level="1",parentId="ig",parentKind="ig",} 1.0
ig_startup_seconds_count{id="ig.admin",kind="start",level="1",parentId="ig",parentKind="ig",} 1.0
ig_startup_seconds_count{id="ig.gateway.gateway.myRoute",kind="heap",level="3",parentId="ig.gateway.gateway",parentKind="heap",} 1.0
ig_startup_seconds_sum
/ig_startup_seconds_total
(deprecated) metric
A timer monitoring type to record the total time to load, build, and start PingGateway and the components in its heap.
Example
ig_startup_seconds_sum{id="ig.gateway.gateway.myRoute",kind="heap",level="3",parentId="ig.gateway.gateway",parentKind="heap",} 5.59...
ig_startup_seconds_sum{id="ig.gateway.gateway._router.myroute1",kind="route",level="4",parentId="ig.gateway.gateway._router",parentKind="heaplet",} 0.01...
ig_startup_seconds_sum{class="StaticResponseHandler",id="ig.gateway.gateway.myRoute.{StaticResponseHandler}/handler",kind="heaplet",level="4",parentId="ig.gateway.gateway.myRoute",parentKind="heap",} 0.00...
ig_startup_seconds
metric
A timer monitoring type to record quantiles for the time to load, build, and start PingGateway and the components in its heap.
Example
ig_startup_seconds{id="ig.gateway.gateway.myRoute",kind="heap",level="3",parentId="ig.gateway.gateway",parentKind="heap",quantile="0.95",} 4.8673300000000004E-4
ig_startup_seconds{id="ig.gateway.gateway.myRoute",kind="heap",level="3",parentId="ig.gateway.gateway",parentKind="heap",quantile="0.98",} 4.8673300000000004E-4
ig_startup_seconds{id="ig.gateway.gateway.myRoute",kind="heap",level="3",parentId="ig.gateway.gateway",parentKind="heap",quantile="0.99",} 4.8673300000000004E-4
ig_startup_seconds{id="ig.gateway.gateway.myRoute",kind="heap",level="3",parentId="ig.gateway.gateway",parentKind="heap",quantile="0.999",} 4.8673300000000004E-4
Metric labels
Startup metrics can take the following labels:
Label | Description |
---|---|
|
The instance component |
|
The instance type. Can take the following values:
|
|
The position of the component in the system hierarchy |
|
The component class |
|
The component parent |
|
The parent component type |
Metrics at the Common REST Monitoring Endpoint (deprecated)
The Common REST Monitoring Endpoint exposes metrics as a JSON format monitoring resource.
When PingGateway is set up as described in the documentation, the endpoint
is http://ig.example.com:8080/openig/metrics/api?_queryFilter=true
.
For an example that queries the Common REST Monitoring Endpoint, refer to Monitor the Common REST Monitoring Endpoint.
Route metrics at the Common REST Monitoring Endpoint (deprecated)
Route metrics at the Common REST Monitoring Endpoint are published with an _id
in the
following pattern:
-
heap.router-name.route.route-name.metric
The following table summarizes the recorded metrics:
Name | Monitoring type | Description |
---|---|---|
|
Counter |
Number of requests processed by the router or route since it was deployed. |
|
Gauge |
Number of requests being processed by the router or route at this moment. |
|
Counter |
Number of responses that threw an exception. |
|
Counter |
Number of responses that were not handled by PingGateway. |
|
Counter |
Number of responses with an HTTP status code |
|
Counter |
Number of responses with an HTTP status code |
|
Counter |
Number of responses with an HTTP status code |
|
Counter |
Number of responses with an HTTP status code |
|
Counter |
Number of responses with an HTTP status code |
|
Counter |
Number of responses with an HTTP status code |
|
Timer |
Time-series summary statistics. |
Learn more in Common REST Monitoring Endpoint.
Router metrics at the Common REST Monitoring Endpoint (deprecated)
Router metrics at the Common REST Monitoring Endpoint are JSON objects, with the following form:
-
[heap name].[router name].deployed-routes
The following table summarizes the recorded metrics:
Name | Monitoring type | Description |
---|---|---|
|
|
Number of routes deployed in the configuration. |
For more information about the the Common REST Monitoring Endpoint, refer to Common REST Monitoring Endpoint.
Timer metrics at the Common REST Monitoring Endpoint (deprecated)
This section describes the metrics recorded at the the ForgeRock Common REST Monitoring Endpoint.
When PingGateway is set up as described in the documentation, the endpoint is http://ig.example.com:8080/openig/metrics/api?_queryFilter=true.
Metrics are published with an _id
in the following pattern:
heap.router-name.route-name.decorator-name.object
Name | Monitoring type | Description |
---|---|---|
|
|
Time to process the request and response in the decorated handler, or in the decorated filter and its downstream filters and handler. |
|
|
Time to process the request and response in the decorated filter. |
|
|
Time to process the request and response in filters and handlers that are downstream of the decorated filter. |