PingGateway 2024.6

Monitoring

The following sections describe monitoring endpoints exposed by PingGateway, and the metrics available at the endpoints.

For information about how to set up and maintain monitoring, refer to Monitor services.

Vert.x Metrics

Vert.x metrics for HTTP clients, TCP clients, and servers are available by default at the Prometheus Scrape Endpoint and Common REST Monitoring Endpoint (deprecated) endpoints. Vert.x metrics provide low-level information about requests and responses, such as the number of bytes, duration, the number of concurrent requests. The available metrics are based on those described in Vert.x core tools metrics.

For more information about Vert.x and PingGateway, refer to the vertx object in AdminHttpApplication (admin.json), and Monitoring Vert.x Metrics.

Monitoring types

This section describes the data types used in monitoring:

Counter

Cumulative metric for a numerical value that only increases.

Gauge

Metric for a numerical value that can increase or decrease.

Summary

Metric that samples observations, providing a count of observations, sum total of observed amounts, average rate of events, and moving average rates across a sliding time window.

The Prometheus view doesn’t provide time-based statistics, because rates can be calculated from the time-series data. Instead, the Prometheus view includes summary metrics whose names have the following suffixes or labels:

  • _count: number of recorded events

  • _sum: total sum of recorded events

  • {quantile="0.5"}: 50% at or below this value

  • {quantile="0.75"}: 75% at or below this value

  • {quantile="0.95"}: 95% at or below this value

  • {quantile="0.98"}: 98% at or below this value

  • {quantile="0.99"}: 99% at or below this value

  • {quantile="0.999"}: 99.9% at or below this value

Timer

Metric combining time-series summary statistics.

Common REST views show summaries as JSON objects. JSON summaries have the following fields:

{
 "max": number,             // maximum duration recorded
 "mean": number,            // total/count, or 0 if count is 0
 "min": number,             // minimum duration recorded for this metric
 "mean_rate": number,       // average rate
 "p50": number,             // 50% at or below this value
 "p75": number,             // 75% at or below this value
 "p95": number,             // 95% at or below this value
 "p98": number,             // 98% at or below this value
 "p99": number,             // 99% at or below this value
 "p999": number,            // 99.9% at or below this value
 "stddev": number,          // standard deviation of recorded durations
 "m15_rate": number,        // fifteen-minute average rate
 "m5_rate": number,         // five-minute average rate
 "m1_rate": number,         // one-minute average rate
 "duration_units": string,  // time unit used in durations
 "rate_units": string,      // event count unit and time unit used in rate
 "seconds_count": number,   // events recorded for this metric
 "count": number,           // events recorded for this metric (deprecated)
 "seconds_total": number    // sum of the durations of events recorded
 "total": number            // sum of the durations of events recorded (deprecated)
}

Metrics at the Prometheus Scrape Endpoint

All products automatically expose a monitoring endpoint where Prometheus can scrape metrics in a standard Prometheus format. Learn more from the Prometheus website.

When PingGateway is set up as described in the Quick install, the Prometheus Scrape Endpoint is available at the following endpoints:

  • http://ig.example.com:8080/openig/metrics/prometheus/0.0.4

  • http://ig.example.com:8080/openig/metrics/prometheus (deprecated)

For an example that queries the Prometheus Scrape Endpoint, refer to Monitor the Prometheus Scrape Endpoint.

Route metrics at the Prometheus Scrape Endpoint

Route metrics at the Prometheus Scrape Endpoint have the following labels:

  • name: Route name, for example, My Route.

    If the router was declared with a default handler, then its metrics are published through the route named default.

  • route: Route identifier, for example, my-route.

  • router: Fully qualified name of the router, for example, gateway.main-router.

The following table summarizes the recorded metrics:

Name Monitoring type Description

ig_route_request_active

Gauge

Number of requests being processed.

ig_route_request_total

Counter

Number of requests processed by the router or route since it was deployed.

ig_route_response_error_total

Counter

Number of responses that threw an exception.

ig_route_response_null_total

Counter

Number of responses that were not handled by PingGateway.

ig_route_response_status_total

Counter

Number of responses by HTTP status code family. The family label depends on the HTTP status code:

  • Informational (1xx)

  • Successful (2xx)

  • Redirection (3xx)

  • Client_error (4xx)

  • Server_error (5xx)

  • Unknown (status code >= 600)

ig_route_response_time:

  • ig_route_response_time_seconds_count

  • ig_route_response_time_count (deprecated)

  • ig_route_response_time_seconds_sum

  • ig_route_response_time_seconds_total(deprecated)

Summary

A summary of response time observations.

Router metrics at the Prometheus Scrape Endpoint

Router metrics at the Prometheus Scrape Endpoint have the following labels:

  • fully_qualified_name: Fully qualified name of the router, for example, gateway.main-router.

  • heap: Name of the heap in which this router is declared, for example, gateway.

  • name: Simple name declared in router configuration, for example, main-router.

The following table summarizes the recorded metrics:

Name Monitoring type Description

ig_router_deployed_routes

Gauge

Number of routes deployed in the configuration.

Cache metrics at the Prometheus Scrape Endpoint

Cache metrics at the Prometheus Scrape Endpoint have the following meters and metrics:

ig_cache_gets_total

A counter monitoring type, incremented when a cache request hits or misses an entry.

Label Possible values

content

session,
policy_decision,
user_profile,
access_token

result

hit, miss

Example:

ig_cache_gets_total{content="session",...result="hit",...} 13.0
ig_cache_gets_total{content="session",...,result="miss"...} 1.0
ig_cache_gets_total{content="policy_decision",...,result="hit",...} 5.0
ig_cache_gets_total{content="policy_decision",...,result="miss",...} 2.0

ig_cache_loads

This meter exposes the following metrics:

ig_cache_loads_seconds

A timer monitoring type, measuring the time in seconds spent successfully or unsuccessfully loading entries in the cache.

Label Possible values

content

session,
policy_decision,
user_profile,
access_token

result

success,failure

quantile

0.5, 0.75, 0.95, 0.98, 0.99, 0.999

Example:

ig_cache_loads_seconds{content="session",...result="success",...quantile="0.5",} 0.057710516
ig_cache_loads_seconds{content="session",...result="success",...quantile="0.75",} 0.057710516
ig_cache_loads_seconds{content="session",...result="success",...quantile="0.95",} 0.057710516
ig_cache_loads_seconds{content="session",...result="success",...quantile="0.98",} 0.057710516
ig_cache_loads_seconds{content="session",...result="success",...quantile="0.99",} 0.057710516
ig_cache_loads_seconds{content="session",...result="success",...quantile="0.999",} 0.057710516
ig_cache_loads_seconds_sum/ig_cache_loads_seconds_total (deprecated)

A timer monitoring type, measuring the cumulated time in seconds spent successfully or unsuccessfully loading entries in the cache.

Label Possible values

content

session,
policy_decision,
user_profile,
access_token

result

success,failure

Example:

ig_cache_loads_seconds_sum{content="session",...result="failure",...} 0.0
ig_cache_loads_seconds_sum{content="session",...result="success",...} 0.057710516
ig_cache_loads_seconds_sum{content="policy_decision",...,result="failure",...} 0.0
ig_cache_loads_seconds_sum{content="policy_decision",...,result="success",...} 0.144314803
ig_cache_loads_seconds_count/ ig_cache_loads_count (deprecated)

A counter monitoring type, incremented when a cache request is successfully or unsuccessfully loaded in the cache.

Label Possible values

content

session,
policy_decision,
user_profile,
access_token

result

success,failure

Example:

ig_cache_loads_count{content="session",...result="failure",...} 0.0
ig_cache_loads_count{content="session",...result="success",...} 1.0
ig_cache_loads_count{content="policy_decision",...,result="failure",...} 0.0
ig_cache_loads_count{content="policy_decision",...,result="success",...} 2.0

ig_cache_evictions

This meter exposes the following metrics:

ig_cache_evictions_count

A counter monitoring type, incremented when an entry is evicted from the cache.

Label Possible values

content

session,
policy_decision,
user_profile,
access_token

cause

COLLECTED,
EXPIRED,
EXPLICIT,
REPLACED,
SIZE

Example

ig_cache_evictions_count{cause="COLLECTED",content="session",...} 0.0
ig_cache_evictions_sum{cause="EXPIRED",content="session",...} 0.0
ig_cache_evictions_count{cause="EXPIRED",content="session",...} 0.0
ig_cache_evictions_sum{cause="EXPLICIT",content="session",...} 0.0
ig_cache_evictions_count{cause="EXPLICIT",content="session",...} 0.0
ig_cache_evictions_sum{cause="REPLACED",content="session",...} 0.0
ig_cache_evictions_count{cause="REPLACED",content="session",...} 0.0
ig_cache_evictions_sum{cause="SIZE",content="session",...} 0.0
ig_cache_evictions_count{cause="SIZE",content="session",...} 0.0
ig_cache_evictions_sum{cause="COLLECTED",content="policy_decision",...} 0.0
ig_cache_evictions_count{cause="COLLECTED",content="policy_decision",...} 0.0
ig_cache_evictions_sum{cause="EXPIRED",content="policy_decision",...} 1.0
ig_cache_evictions_count{cause="EXPIRED",content="policy_decision",...} 1.0
ig_cache_evictions_sum{cause="EXPLICIT",content="policy_decision",...} 0.0
ig_cache_evictions_count{cause="EXPLICIT",content="policy_decision",...} 0.0
ig_cache_evictions_sum{cause="REPLACED",content="policy_decision",...} 0.0
ig_cache_evictions_count{cause="REPLACED",content="policy_decision",...} 0.0
ig_cache_evictions_sum{cause="SIZE",content="policy_decision",...} 0.0
ig_cache_evictions_count{cause="SIZE",content="policy_decision",...} 0.0
ig_cache_evictions_sum/ig_cache_evictions_total (deprecated)

A counter monitoring type, incremented when an entry is evicted from the cache. Each evicted entry has the weight 1, so this metric is equal to ig_cache_evictions_count.

Label Possible values

content

session,
policy_decision,
user_profile,
access_token

cause

COLLECTED,
EXPIRED,
EXPLICIT,
REPLACED,
SIZE

Example

ig_cache_evictions_sum{cause="COLLECTED",content="session",...} 0.0
ig_cache_evictions_count{cause="COLLECTED",content="session",...} 0.0
ig_cache_evictions_sum{cause="EXPIRED",content="session",...} 0.0
ig_cache_evictions_count{cause="EXPIRED",content="session",...} 0.0
ig_cache_evictions_sum{cause="EXPLICIT",content="session",...} 0.0
ig_cache_evictions_count{cause="EXPLICIT",content="session",...} 0.0
ig_cache_evictions_sum{cause="REPLACED",content="session",...} 0.0
ig_cache_evictions_count{cause="REPLACED",content="session",...} 0.0
ig_cache_evictions_sum{cause="SIZE",content="session",...} 0.0
ig_cache_evictions_count{cause="SIZE",content="session",...} 0.0
ig_cache_evictions_sum{cause="COLLECTED",content="policy_decision",...} 0.0
ig_cache_evictions_count{cause="COLLECTED",content="policy_decision",...} 0.0
ig_cache_evictions_sum{cause="EXPIRED",content="policy_decision",...} 1.0
ig_cache_evictions_count{cause="EXPIRED",content="policy_decision",...} 1.0
ig_cache_evictions_sum{cause="EXPLICIT",content="policy_decision",...} 0.0
ig_cache_evictions_count{cause="EXPLICIT",content="policy_decision",...} 0.0
ig_cache_evictions_sum{cause="REPLACED",content="policy_decision",...} 0.0
ig_cache_evictions_count{cause="REPLACED",content="policy_decision",...} 0.0
ig_cache_evictions_sum{cause="SIZE",content="policy_decision",...} 0.0
ig_cache_evictions_count{cause="SIZE",content="policy_decision",...} 0.0

Timer metrics at the Prometheus Scrape Endpoint

Timer metrics at the Prometheus Scrape Endpoint have the following following labels:

  • decorated_object

  • heap

  • name (decorator name)

  • route

  • router

Name Monitoring type Description

ig_timerdecorator_handler_elapsed_seconds

Summary

Time to process the request and response in the decorated handler.

ig_timerdecorator_filter_elapsed_seconds

Summary

Time to process the request and response in the decorated filter and its downstream filters and handler.

ig_timerdecorator_filter_internal_seconds

Summary

Time to process the request and response in the decorated filter.

ig_timerdecorator_filter_downstream_seconds

Summary

Time to process the request and response in filters and handlers that are downstream of the decorated filter.

WebSocket metrics at the Prometheus Scrape Endpoint

WebSocket metrics at the Prometheus Scrape Endpoint have the following labels:

  • frame_type

  • fully_qualified_name

  • heap

  • local

  • name

  • remote

  • route

  • router

The following table summarizes the recorded metrics:

Name Monitoring type Description

ig_http_client_active_ws_connections

Gauge

Number of client websockets currently open.

ig_http_server_active_ws_connections

Gauge

Number of server websockets currently open.

ig_reverseproxyhandler_ws_proxy_application_side_errors_total1

Counter

Number of application-side proxy errors.

ig_reverseproxyhandler_ws_proxy_application_side_read_total1

Counter

Number of application-side proxy frames received.

ig_reverseproxyhandler_ws_proxy_application_side_write_total1

Counter

Number of application-side proxy frames sent.

ig_reverseproxyhandler_ws_proxy_client_side_errors_total1

Counter

Number of client-side proxy errors.

ig_reverseproxyhandler_ws_proxy_client_side_read_total1

Counter

Number of client-side proxy frames received.

ig_reverseproxyhandler_ws_proxy_client_side_write_total1

Counter

Number of client-side proxy frames sent.

ig_reverseproxyhandler_ws_proxy_tunnels_active1

Gauge

Number of active websocket proxy tunnels.

ig_reverseproxyhandler_ws_proxy_tunnels_created_total1

Counter

Number of websocket proxy tunnels created.

1 The reverseproxyhandler in the metric name reflects the name of the ReverseProxyHandler component in the PingGateway configuration. The name shown in these examples derives from the route configuration shown in WebSocket traffic.

Startup metrics at the Prometheus Scrape Endpoint

Startup metrics at the Prometheus Scrape Endpoint are captured once at startup to record the time to load, build, and start PingGateway and the components in its heap.

ig_startup_seconds_count metric

A counter monitoring type to record the number of times the startup process has been measured for a component or instance. The value is always 1.

Example

ig_startup_seconds_count{id="ig.logging",kind="setup",level="1",parentId="ig",parentKind="ig",} 1.0
ig_startup_seconds_count{id="ig.admin",kind="start",level="1",parentId="ig",parentKind="ig",} 1.0
ig_startup_seconds_count{id="ig.gateway.gateway.myRoute",kind="heap",level="3",parentId="ig.gateway.gateway",parentKind="heap",} 1.0

ig_startup_seconds_sum/ig_startup_seconds_total (deprecated) metric

A timer monitoring type to record the total time to load, build, and start PingGateway and the components in its heap.

Example

ig_startup_seconds_sum{id="ig.gateway.gateway.myRoute",kind="heap",level="3",parentId="ig.gateway.gateway",parentKind="heap",} 5.59...
ig_startup_seconds_sum{id="ig.gateway.gateway._router.myroute1",kind="route",level="4",parentId="ig.gateway.gateway._router",parentKind="heaplet",} 0.01...
ig_startup_seconds_sum{class="StaticResponseHandler",id="ig.gateway.gateway.myRoute.{StaticResponseHandler}/handler",kind="heaplet",level="4",parentId="ig.gateway.gateway.myRoute",parentKind="heap",} 0.00...

ig_startup_seconds metric

A timer monitoring type to record quantiles for the time to load, build, and start PingGateway and the components in its heap.

Example

ig_startup_seconds{id="ig.gateway.gateway.myRoute",kind="heap",level="3",parentId="ig.gateway.gateway",parentKind="heap",quantile="0.95",} 4.8673300000000004E-4
ig_startup_seconds{id="ig.gateway.gateway.myRoute",kind="heap",level="3",parentId="ig.gateway.gateway",parentKind="heap",quantile="0.98",} 4.8673300000000004E-4
ig_startup_seconds{id="ig.gateway.gateway.myRoute",kind="heap",level="3",parentId="ig.gateway.gateway",parentKind="heap",quantile="0.99",} 4.8673300000000004E-4
ig_startup_seconds{id="ig.gateway.gateway.myRoute",kind="heap",level="3",parentId="ig.gateway.gateway",parentKind="heap",quantile="0.999",} 4.8673300000000004E-4

Metric labels

Startup metrics can take the following labels:

Label Description

id (string)

The instance component

kind (string)

The instance type. Can take the following values:

  • ig: Top-level component

  • start: Gateway and admin component

  • heap: Heap component

  • heaplet: Heaplet component

  • route: Route component

  • setup: Internal processing

  • quantile: Quantiles of the value given by ig_startup_seconds_sum

    For ig_startup_seconds only

level (integer)

The position of the component in the system hierarchy

class (string, optional)

The component class

parentId: (string, optional)

The component parent

parentKind: (string, optional)

The parent component type

Metrics at the Common REST Monitoring Endpoint (deprecated)

The Common REST Monitoring Endpoint exposes metrics as a JSON format monitoring resource.

When PingGateway is set up as described in the documentation, the endpoint is http://ig.example.com:8080/openig/metrics/api?_queryFilter=true.

For an example that queries the Common REST Monitoring Endpoint, refer to Monitor the Common REST Monitoring Endpoint.

Route metrics at the Common REST Monitoring Endpoint (deprecated)

Route metrics at the Common REST Monitoring Endpoint are published with an _id in the following pattern:

  • heap.router-name.route.route-name.metric

The following table summarizes the recorded metrics:

Name Monitoring type Description

request

Counter

Number of requests processed by the router or route since it was deployed.

request.active

Gauge

Number of requests being processed by the router or route at this moment.

response.error

Counter

Number of responses that threw an exception.

response.null

Counter

Number of responses that were not handled by PingGateway.

response.status.client_error

Counter

Number of responses with an HTTP status code 400-499, indicating client error.

response.status.informational

Counter

Number of responses with an HTTP status code 100-199, indicating that they are provisional responses.

response.status.redirection

Counter

Number of responses with an HTTP status code 300-399, indicating a redirect.

response.status.server_error

Counter

Number of responses with an HTTP status code 500-599, indicating server error.

response.status.successful

Counter

Number of responses with an HTTP status code 200-299, indicating success.

response.status.unknown

Counter

Number of responses with an HTTP status code 600-699, indicating that a request failed and was not executed.

response.time

Timer

Time-series summary statistics.

Router metrics at the Common REST Monitoring Endpoint (deprecated)

Router metrics at the Common REST Monitoring Endpoint are JSON objects, with the following form:

  • [heap name].[router name].deployed-routes

The following table summarizes the recorded metrics:

Name Monitoring type Description

deployed-routes

Gauge

Number of routes deployed in the configuration.

For more information about the the Common REST Monitoring Endpoint, refer to Common REST Monitoring Endpoint.

Timer metrics at the Common REST Monitoring Endpoint (deprecated)

This section describes the metrics recorded at the the ForgeRock Common REST Monitoring Endpoint.

When PingGateway is set up as described in the documentation, the endpoint is http://ig.example.com:8080/openig/metrics/api?_queryFilter=true.

Metrics are published with an _id in the following pattern:

heap.router-name.route-name.decorator-name.object

Timer metrics at the Common REST Monitoring Endpoint
Name Monitoring type Description

elapsed

Timer

Time to process the request and response in the decorated handler, or in the decorated filter and its downstream filters and handler.

internal

Timer

Time to process the request and response in the decorated filter.

downstream

Timer

Time to process the request and response in filters and handlers that are downstream of the decorated filter.