Distributed tracing

In a distributed system, requests pass through multiple services hosted on multiple servers. Without telemetry data, it can be difficult to identify the root cause of performance issues or errors.

Distributed tracing provides visibility into the full path a request takes through a distributed system. PingDirectory supports the OpenTelemetry framework for collecting distributed tracing data. You can send traces collected by PingDirectory to a backend service, such as Jaeger, for aggregation, storage, and visualization.

This feature is provided as a Preview, which means that it isn’t supported and should not be used in production environments. Learn more in Feature statuses.

Additionally, distributed tracing is only available for the PingDirectory server.

Why use distributed tracing?

Diagnosing escalated production issues can take hours to days, involving multiple subject matter experts trying to correlate fragmented logs and understand what happened, often yielding a lot of noise and little clarity. The more services and instances involved, the more challenging troubleshooting becomes.

Distributed tracing addresses these challenges by supporting end-to-end request visibility and data correlation across multiple services and servers. As a result, you can troubleshoot performance issues and errors more quickly and effectively. You can also use distributed tracing to optimize system performance by identifying bottlenecks and inefficiencies in service interactions.

What is distributed tracing?

Distributed tracing shows you how an incoming request was processed across all servers and services in a distributed system, including:

Which servers and services the request went through.
How much time each service took to process its part of the request.
How the services are connected.
What the failure point was in case of a request failure.

A distributed trace provides a visual representation of a request’s journey. Spans show when an operation started, when it ended, and its duration. When one service calls another, these calls are linked within the trace, showing the flow and time spent in each service. The PingDirectory server uses the OpenTelemetry framework to create and manage these spans and traces.

Traces

A trace represents the path of a request through an application. A trace is made up of one or more spans. Learn more about traces in the OpenTelemetry documentation.

Spans

A span is a segment of a request journey. It represents a unit of work or an operation within a service. Each span includes the following elements:

traceId represents the trace that the span is a part of.
spanId is a unique ID for the span.
parentSpanId is the ID of the originating request.

Servers add span attributes following the semantic conventions, with LDAP-specific attributes based on HTTP conventions.

Root span

The root span indicates the start and end of an entire operation. The parentSpanId of the root span is null because the root span isn’t part of an existing trace. Subsequent spans in the trace have their own unique spanId. Their traceId is the same as that of the root span, and their parentSpanId matches the spanId of the root span.

OpenTelemetry

OpenTelemetry is an open-source observability framework for instrumenting, generating, collecting, and exporting telemetry data. It provides a standardized way to capture distributed traces across different services and platforms. It doesn’t provide a backend for storing or analyzing telemetry data. Learn more in the OpenTelemetry documentation.

Which requests are traced?

All incoming LDAP requests, including those from PingFederate, are supported. Requests must include the W3C trace context LDAP request control to propagate trace information.

The W3C trace context allows for consistent correlation IDs and metadata across systems that support the W3C standard. If a request doesn’t include the W3C trace context control, a new trace starts for that request.

Enable and configure tracing

Distributed tracing is disabled by default. To enable the feature, you need to enable the OpenTelemetry plugin, as follows:

bin/dsconfig set-plugin-prop \
    --plugin-name OpenTelemetry \
    --set enabled:true

Supply the following properties to configure how spans are sampled and where telemetry data gets exported:

Property Description Values

key-manager-provider

The key manager provider to use if the OTLP/HTTP collector requires a client certificate.

For example, JKS

trust-manager-provider

The trust manager provider used to validate the certificate presented by the OTLP/HTTP collector.

For example, JKS

ssl-cert-nickname

The nickname in the associated key store for the certificate to present to the OTLP/HTTP collector.

You can leave this undefined if no key manager provider is configured or if the JVM should select a certificate automatically.

For example, server-cert

tracer-exporter-otlp-endpoint

Sets the OTLP/HTTP endpoint where the server exports sampled spans.

The endpoint must start with either http:// or https:// and include the full HTTP path.

If you don’t set this value, the spans won’t be exported.

tracer-sampler

Selects the sampling strategy used when new spans are created.

always-on: Samples every span.
always-off: Samples none of the spans and produces no telemetry data.
trace-id-ratio: Samples spans according to the tracer-sampler-ratio value.
parent-based-default-always-on: Samples according to the parent span configuration, defaulting to always-on when there is no parent span.
parent-based-default-always-off: Samples according to the parent span configuration, defaulting to always-off when there is no parent span.
parent-based-default-trace-id-ratio: Samples according to the parent span configuration, defaulting to trace-id-ratio when there is no parent span.

tracer-sampler-ratio

Specifies the sampling percentage used by ratio-based samplers. Higher values result in more spans being sampled but could impact performance.

When the sampling strategy is either trace-id-ratio or parent-based-default-trace-id-ratio, this value determines the percentage of new spans that should be sampled.

0 - 100 (inclusive)

The default value is 10.

The following example configures the plugin to push all traces to http://localhost:4318/v1/traces, sampling all the spans:

bin/dsconfig set-plugin-prop \
    --plugin-name OpenTelemetry \
    --set enabled:true \
    --set tracer-sampler:always-on \
    --set tracer-exporter-otlp-endpoint:http://localhost:4318/v1/traces \
    --hostname localhost \
    --port 4444 \
    --bindDN uid=admin \
    --bindPassword password \
    --no-prompt

How to view traces

PingDirectory can push traces to an OpenTelemetry Protocol (OTLP) endpoint over HTTP. Any backend that supports OTLP/HTTP can be used to collect and visualize the traces.

Try the Jaeger tracing All-in-one Docker image to capture exported spans. By default, Jaeger stores the spans in memory, but you can configure Jaeger to send the spans to various persistent datastores external to the Docker image.