Distributed tracing with OpenTelemetry

In a distributed system, requests pass through multiple services hosted on multiple servers. Without telemetry data, it can be difficult to identify the root cause of performance issues or errors.

Distributed tracing provides visibility into the full path a request takes through a distributed system. DS supports the OpenTelemetry framework for collecting distributed tracing data. You can send traces collected by DS to a backend service, such as Jaeger, for aggregation, storage, and visualization.

The interface stability for OpenTelemetry support is Evolving.

The plugin configuration, the content of spans, the span name, and the span attributes are all subject to change without prior notice.

Why use distributed tracing?

Diagnosing escalated production issues can take hours to days, involving multiple subject-matter experts trying to correlate fragmented logs and understand what happened, often yielding a lot of noise and little clarity. The more services and instances involved, the more challenging troubleshooting becomes.

Distributed tracing addresses these challenges by supporting end-to-end request visibility and data correlation across multiple services and servers. As a result, you can troubleshoot performance issues and errors more quickly and effectively. You can also use distributed tracing to optimize system performance by identifying bottlenecks and inefficiencies in service interactions.

What is distributed tracing?

Distributed tracing shows you how an incoming request was processed across all servers and services in a distributed system, including:

Which servers and services the request went through.
How much time each service took to process its part of the request.
How the services are connected.
What the failure point was in case of a request failure.

A distributed trace provides a visual representation of a request’s path through the system. Spans show when an operation started, when it ended, and its duration. When one service calls another, these calls are linked within the trace, showing the flow and time spent in each service. The DS server uses the OpenTelemetry framework to create and manage these spans and traces.

Traces

A trace represents the path of a request through an application. A trace is made up of one or more spans. Learn more about traces in the OpenTelemetry documentation.

Spans

A span is a segment of a request’s path through the system. It represents a unit of work or an operation within a service. Each span includes the following elements:

traceId represents the trace that the span is a part of.
spanId is a unique ID for the span.
parentSpanId is the ID of the originating request.

Servers add span attributes following the semantic conventions, with LDAP-specific attributes based on HTTP conventions.

Root span

The root span indicates the start and end of an entire operation. The parentSpanId of the root span is null because the root span isn’t part of an existing trace. Subsequent spans in the trace have their own unique spanId. Their traceId is the same as that of the root span, and their parentSpanId matches the spanId of the root span.

OpenTelemetry

OpenTelemetry is an open-source observability framework for instrumenting, generating, collecting, and exporting telemetry data. It provides a standardized way to capture distributed traces across different services and platforms. It doesn’t provide a backend for storing or analyzing telemetry data. Learn more in the OpenTelemetry documentation.

Which requests are traced?

All incoming LDAP requests are supported. Requests must include the W3C trace context LDAP request control to propagate trace information.

The W3C trace context allows for consistent correlation IDs and metadata across systems that support the W3C standard. If a request doesn’t include the W3C trace context control, a new trace starts for that request.

Enable and configure tracing

Distributed tracing is disabled by default. To enable the feature, enable the OpenTelemetry plugin, making sure it targets the endpoint for the service. You can limit what the plugin pushes using additional settings.

The following example enables the plugin. It pushes all traces to the default endpoint, http://localhost:4318/v1/traces. It samples all the spans, which is the default behavior. Adapt the configuration as necessary for your production deployment:

$ dsconfig \
 set-plugin-prop \
 --plugin-name OpenTelemetry \
 --set enabled:true \
 --set tracer-sampler:always-on \
 --set tracer-exporter-otlp-endpoint:http://localhost:4318/v1/traces \
 --hostname localhost \
 --port 4444 \
 --bindDN uid=admin \
 --bindPassword password \
 --trustStorePath /path/to/opendj/config/keystore \
 --trustStoreType PKCS12 \
 --trustStorePassword:file /path/to/opendj/config/keystore.pin \
 --no-prompt

Learn about all the optional plugin settings in the OpenTelemetry Plugin reference.

How to view traces

DS can push traces to an OpenTelemetry Protocol (OTLP) endpoint over HTTP. Any backend that supports OTLP/HTTP can be used to collect and visualize the traces.

Try the Jaeger tracing All-in-one Docker image to capture exported spans. By default, Jaeger stores the spans in memory, but you can configure Jaeger to send the spans to various persistent datastores external to the Docker image.

PingDS

Distributed tracing with OpenTelemetry

Why use distributed tracing?

What is distributed tracing?

Which requests are traced?

Enable and configure tracing

How to view traces