Directory Services 7.2.5

Troubleshooting

Define the problem

To solve your problem, save time by clearly defining it first. A problem statement compares the difference between observed behavior and expected behavior:

  • What exactly is the problem?

    What is the behavior you expected?

    What is the behavior you observed?

  • How do you reproduce the problem?

  • When did the problem begin?

    Under similar circumstances, when does the problem not occur?

  • Is the problem permanent?

    Intermittent?

    Is it getting worse? Getting better? Staying the same?

Performance

Before troubleshooting performance, make sure:

When directory operations take too long, meaning request latency is high, fix the problem first in your test or staging environment. Perform these steps in order and stop when you find a fix:

  1. Check for unindexed searches and prevent them when possible.

    Unindexed searches are expensive operations, particularly for large directories. When unindexed searches consume the server’s resources, performance suffers for concurrent operations and for later operations if an unindexed search causes widespread changes to database and file system caches.

  2. Check performance settings for the server including JVM heap size and DB cache size.

    Try adding more RAM if memory seems low.

  3. Read the request queue monitoring statistics over LDAP or over HTTP.

    If many requests are in the queue, the troubleshooting steps are different for read and write operations. Read and review the request statistics available over LDAP or over HTTP.

    If you persistently have many:

    • Pending read requests, such as unindexed searches or big searches, try adding CPUs.

    • Pending write requests, try adding IOPS, such as faster or higher throughput disks.

Installation problems

Use the logs

Installation and upgrade procedures result in a log file tracing the operation. Look for this in the command output:

See file for a detailed log of this operation.

Antivirus interference

Prevent antivirus and intrusion detection systems from interfering with DS software.

Before using DS software with antivirus or intrusion detection software, consider the following potential problems:

Interference with normal file access

Antivirus and intrusion detection systems that perform virus scanning, sweep scanning, or deep file inspection are not compatible with DS file access, particularly write access.

Antivirus and intrusion detection software have incorrectly marked DS files as suspect to infection, because they misinterpret normal DS processing.

Prevent antivirus and intrusion detection systems from scanning DS files, except these folders:

/path/to/opendj/bat/

Windows command-line tools

/path/to/opendj/bin/

UNIX/Linux command-line tools

/path/to/opendj/extlib/

Optional additional .jar files used by custom plugins

/path/to/opendj/lib/

Scripts and libraries shipped with DS servers

Port blocking

Antivirus and intrusion detection software can block ports that DS uses to provide directory services.

Make sure that your software does not block the ports that DS software uses. For details, see Administrative access.

Negative performance impact

Antivirus software consumes system resources, reducing resources available to other services including DS servers.

Running antivirus software can therefore have a significant negative impact on DS server performance. Make sure that you test and account for the performance impact of running antivirus software before deploying DS software on the same systems.

JE initialization

When starting a directory server on a Linux system, make sure the server user can watch enough files. If the server user cannot watch enough files, you might see an error message in the server log such as this:

InitializationException: The database environment could not be opened:
com.sleepycat.je.EnvironmentFailureException: (JE version) /path/to/opendj/db/userData
or its sub-directories to WatchService.
UNEXPECTED_EXCEPTION: Unexpected internal Exception, may have side effects.
Environment is invalid and must be closed.

File notification

A directory server backend database monitors file events. On Linux systems, backend databases use the inotify API for this purpose. The kernel tunable fs.inotify.max_user_watches indicates the maximum number of files a user can watch with the inotify API.

Make sure this tunable is set to at least 512K:

$ sysctl fs.inotify.max_user_watches

fs.inotify.max_user_watches = 524288

If this tunable is set lower than that, update the /etc/sysctl.conf file to change the setting permanently, and use the sysctl -p command to reload the settings:

$ echo fs.inotify.max_user_watches=524288 | sudo tee -a /etc/sysctl.conf
[sudo] password for admin:

$ sudo sysctl -p
fs.inotify.max_user_watches = 524288

Forgotten superuser password

By default, DS servers store the entry for the directory superuser in an LDIF backend. Edit the file to reset the password:

  1. Generate the encoded version of the new password:

    $ encode-password --storageScheme PBKDF2-HMAC-SHA256 --clearPassword password
    
    {PBKDF2-HMAC-SHA256}10<hash>
  2. Stop the server while you edit the LDIF file for the backend:

    $ stop-ds
  3. Replace the existing password with the encoded version.

    In the db/rootUser/rootUser.ldif file, carefully replace the userPassword value with the new, encoded password:

    dn: uid=admin
    ...
    uid: admin
    userPassword: <encoded-password>

    Trailing whitespace is significant in LDIF. Take care not to add any trailing whitespace at the end of the line.

  4. Restart the server:

    $ start-ds
  5. Verify that you can use the directory superuser account with the new password:

    $ status \
     --bindDn uid=admin \
     --bindPassword password \
     --hostname localhost \
     --port 4444 \
     --usePkcs12TrustStore /path/to/opendj/config/keystore \
     --trustStorePassword:file /path/to/opendj/config/keystore.pin \
     --script-friendly
    ...
    "isRunning" : true,

Debug logging

DS debug logging can generate a high volume of debug messages. Use debug logging very sparingly on production systems.

  1. Create one or more debug targets.

    No debug targets are enabled by default:

    $ dsconfig \
     list-debug-targets \
     --hostname localhost \
     --port 4444 \
     --bindDN uid=admin \
     --bindPassword password \
     --publisher-name "File-Based Debug Logger" \
     --usePkcs12TrustStore /path/to/opendj/config/keystore \
     --trustStorePassword:file /path/to/opendj/config/keystore.pin \
     --no-prompt
    
    Debug Target : enabled : debug-exceptions-only
    -------------:---------:----------------------

    A debug target specifies a fully qualified DS Java package, class, or method:

    $ dsconfig \
     create-debug-target \
     --hostname localhost \
     --port 4444 \
     --bindDN uid=admin \
     --bindPassword password \
     --publisher-name "File-Based Debug Logger" \
     --type generic \
     --target-name org.opends.server.api \
     --set enabled:true \
     --usePkcs12TrustStore /path/to/opendj/config/keystore \
     --trustStorePassword:file /path/to/opendj/config/keystore.pin \
     --no-prompt
  2. Enable the debug log, opendj/logs/debug :

    $ dsconfig \
     set-log-publisher-prop \
     --hostname localhost \
     --port 4444 \
     --bindDN uid=admin \
     --bindPassword password \
     --publisher-name "File-Based Debug Logger" \
     --set enabled:true \
     --usePkcs12TrustStore /path/to/opendj/config/keystore \
     --trustStorePassword:file /path/to/opendj/config/keystore.pin \
     --no-prompt

    The server immediately begins to write debug messages to the log file.

  3. Read messages in the debug log file:

    $ tail -f /path/to/opendj/logs/debug
  4. Disable the debug log as soon as it is no longer required.

Lockdown mode

Misconfiguration can put the DS server in a state where you must prevent users and applications from accessing the directory until you have fixed the problem.

DS servers support lockdown mode . Lockdown mode permits connections only on the loopback address, and permits only operations requested by superusers, such as uid=admin.

To put the DS server into lockdown mode, the server must be running. You cause the server to enter lockdown mode by starting a task. Notice that the modify operation is performed over the loopback address (accessing the DS server on the local host):

$ ldapmodify \
 --hostname localhost \
 --port 1636 \
 --useSsl \
 --usePkcs12TrustStore /path/to/opendj/config/keystore \
 --trustStorePassword:file /path/to/opendj/config/keystore.pin \
 --bindDN uid=admin \
 --bindPassword password << EOF
dn: ds-task-id=Enter Lockdown Mode,cn=Scheduled Tasks,cn=tasks
objectClass: top
objectClass: ds-task
ds-task-id: Enter Lockdown Mode
ds-task-class-name: org.opends.server.tasks.EnterLockdownModeTask
EOF

The DS server logs a notice message in logs/errors when lockdown mode takes effect:

...msg=Lockdown task Enter Lockdown Mode finished execution

Client applications that request operations get a message concerning lockdown mode:

$ ldapsearch \
 --hostname localhost \
 --port 1636 \
 --useSsl \
 --usePkcs12TrustStore /path/to/opendj/config/keystore \
 --trustStorePassword:file /path/to/opendj/config/keystore.pin \
 --baseDN "" \
 --searchScope base \
 "(objectclass=*)" \
 +

# The LDAP search request failed: 53 (Unwilling to Perform)
# Additional Information:  Rejecting the requested operation because the server is in lockdown mode and will only accept requests from root users over loopback connections

Leave lockdown mode by starting a task:

$ ldapmodify \
 --hostname localhost \
 --port 1636 \
 --useSsl \
 --usePkcs12TrustStore /path/to/opendj/config/keystore \
 --trustStorePassword:file /path/to/opendj/config/keystore.pin \
 --bindDN uid=admin \
 --bindPassword password << EOF
dn: ds-task-id=Leave Lockdown Mode,cn=Scheduled Tasks,cn=tasks
objectClass: top
objectClass: ds-task
ds-task-id: Leave Lockdown Mode
ds-task-class-name: org.opends.server.tasks.LeaveLockdownModeTask
EOF

The DS server logs a notice message when leaving lockdown mode:

...msg=Leave Lockdown task Leave Lockdown Mode finished execution

LDIF import

  • By default, DS directory servers check that entries you import match the LDAP schema.

    You can temporarily bypass this check with the import-ldif --skipSchemaValidation option.

  • By default, DS servers ensure that entries have only one structural object class.

    You can relax this behavior with the advanced global configuration property, single-structural-objectclass-behavior.

    This can be useful when importing data exported from Sun Directory Server.

    For example, warn when entries have more than one structural object class, rather than rejecting them:

    $ dsconfig \
     set-global-configuration-prop \
     --hostname localhost \
     --port 4444 \
     --bindDN uid=admin \
     --bindPassword password \
     --set single-structural-objectclass-behavior:warn \
     --usePkcs12TrustStore /path/to/opendj/config/keystore \
     --trustStorePassword:file /path/to/opendj/config/keystore.pin \
     --no-prompt
  • By default, DS servers check syntax for several attribute types. Relax this behavior using the advanced global configuration property, invalid-attribute-syntax-behavior.

  • Use the import-ldif -R rejectFile --countRejects options to log rejected entries and to return the number of rejected entries as the command’s exit code.

Once you resolve the issues, reinstate the default behavior to avoid importing bad data.

Security problems

Incompatible Java versions

Due to a change in Java APIs, the same DS deployment ID generates different CA key pairs with Java 11 and Java 17 and later. When running the dskeymgr and setup commands, use the same Java environment everywhere in the deployment.

Using different Java versions is a problem if you use deployment ID-based CA certificates. Replication breaks, for example, when you use the setup command for a new server with a more recent version of Java than was used to set up existing servers. The error log includes a message such as the following:

...category=SYNC severity=ERROR msgID=119 msg=Directory server DS(server_id)
encountered an unexpected error while connecting to replication server host:port for domain "base_dn":
ValidatorException: PKIX path validation failed: java.security.cert.CertPathValidatorException:
signature check failed

To work around the issue, follow these steps:

  1. Update all DS servers to use the same Java version.

    Make sure you have a required Java environment installed on the system.

    If your default Java environment is not appropriate, use one of the following solutions:

    • Edit the default.java-home setting in the opendj/config/java.properties file.

    • Set OPENDJ_JAVA_HOME to the path to the correct Java environment.

    • Set OPENDJ_JAVA_BIN to the absolute path of the java command.

  2. Export CA certificates generated with the different Java versions.

    1. Export the CA certificate from an old server:

      $ keytool \
       -exportcert \
       -alias ca-cert \
       -keystore /path/to/old-server/config/keystore \
       -storepass:file /path/to/old-server/config/keystore.pin \
       -file java11-ca-cert.pem
    2. Export the CA certificate from a new server:

      $ keytool \
       -exportcert \
       -alias ca-cert \
       -keystore /path/to/new-server/config/keystore \
       -storepass:file /path/to/new-server/config/keystore.pin \
       -file java17-ca-cert.pem
  3. On all existing DS servers, import the new CA certificate:

    $ keytool \
     -importcert \
     -trustcacerts \
     -alias alt-ca-cert \
     -keystore /path/to/old-server/config/keystore \
     -storepass:file /path/to/old-server/config/keystore.pin \
     -file java17-ca-cert.pem \
     -noprompt
  4. On all new DS servers, import the old CA certificate:

    $ keytool \
     -importcert \
     -trustcacerts \
     -alias alt-ca-cert \
     -keystore /path/to/new-server/config/keystore \
     -storepass:file /path/to/new-server/config/keystore.pin \
     -file java11-ca-cert.pem \
     -noprompt

The servers reload their keystores dynamically and replication works as expected.

Certificate-based authentication

Replication uses TLS to protect directory data on the network. Misconfiguration can cause replicas to fail to connect due to handshake errors. This leads to repeated error log messages in the replication log file such as the following:

...msg=Replication server accepted a connection from address
 to local address address but the SSL handshake failed.
 This is probably benign, but may indicate a transient network outage
 or a misconfigured client application connecting to this replication server.
 The error was: Received fatal alert: certificate_unknown

You can collect debug trace messages to help determine the problem. To see the TLS debug messages, start the server with javax.net.debug set:

$ OPENDJ_JAVA_ARGS="-Djavax.net.debug=all" start-ds

The debug trace settings result in many, many messages. To resolve the problem, review the output of starting the server, looking in particular for handshake errors.

If the chain of trust for your PKI is broken somehow, consider renewing or replacing keys, as described in Key management. Make sure that trusted CA certificates are configured as expected.

FIPS and key wrapping

DS servers use shared asymmetric keys to protect shared symmetric secret keys for data encryption.

By default, DS uses direct encryption to protect the secret keys.

When using a FIPS-compliant security provider that doesn’t allow direct encryption, such as Bouncy Castle, change the Crypto Manager configuration to set the advanced property, key-wrapping-mode: WRAP. With this setting, DS uses wrap mode to protect the secret keys in a compliant way.

Compromised keys

How you handle the problem depends on which key was compromised:

  • For keys generated by the server, or with a deployment ID and password, see Retire secret keys.

  • For a private key whose certificate was signed by a CA, contact the CA for help. The CA might choose to publish a certificate revocation list (CRL) that identifies the certificate of the compromised key.

    Replace the key pair that has the compromised private key.

  • For a private key whose certificate was self-signed, replace the key pair that has the compromised private key.

    Make sure the clients remove the compromised certificate from their truststores. They must replace the certificate of the compromised key with the new certificate.

Client problems

Use the logs

By default, DS servers record messages for LDAP client operations in the logs/ldap-access.audit.json log file.

Show example log messages
[
{
  "eventName": "DJ-LDAP",
  "client": {
    "ip": "<clientIp>",
    "port": 12345
  },
  "server": {
    "ip": "<clientIp>",
    "port": 1389
  },
  "request": {
    "protocol": "LDAP",
    "operation": "CONNECT",
    "connId": 0
  },
  "transactionId": "0",
  "response": {
    "status": "SUCCESSFUL",
    "statusCode": "0",
    "elapsedTime": 0,
    "elapsedTimeUnits": "MILLISECONDS"
  },
  "timestamp": "<timestamp>",
  "_id": "<uuid>"
},
{
  "eventName": "DJ-LDAP",
  "client": {
    "ip": "<clientIp>",
    "port": 12345
  },
  "server": {
    "ip": "<clientIp>",
    "port": 1389
  },
  "request": {
    "protocol": "LDAP",
    "operation": "SEARCH",
    "connId": 0,
    "msgId": 1,
    "dn": "dc=example,dc=com",
    "scope": "sub",
    "filter": "(uid=bjensen)",
    "attrs": ["ALL"]
  },
  "transactionId": "0",
  "response": {
    "status": "SUCCESSFUL",
    "statusCode": "0",
    "elapsedTime": 9,
    "elapsedTimeUnits": "MILLISECONDS",
    "nentries": 1
  },
  "timestamp": "<timestamp>",
  "_id": "<uuid>"
},
{
  "eventName": "DJ-LDAP",
  "client": {
    "ip": "<clientIp>",
    "port": 12345
  },
  "server": {
    "ip": "<clientIp>",
    "port": 1389
  },
  "request": {
    "protocol": "LDAP",
    "operation": "UNBIND",
    "connId": 0,
    "msgId": 2
  },
  "transactionId": "0",
  "timestamp": "<timestamp>",
  "_id": "<uuid>"
},
{
  "eventName": "DJ-LDAP",
  "client": {
    "ip": "<clientIp>",
    "port": 12345
  },
  "server": {
    "ip": "<clientIp>",
    "port": 1389
  },
  "request": {
    "protocol": "LDAP",
    "operation": "DISCONNECT",
    "connId": 0
  },
  "transactionId": "0",
  "response": {
    "status": "SUCCESSFUL",
    "statusCode": "0",
    "elapsedTime": 0,
    "elapsedTimeUnits": "MILLISECONDS",
    "reason": "Client Unbind"
  },
  "timestamp": "<timestamp>",
  "_id": "<uuid>"
}
]

Each message specifies the operation performed, the client that requested the operation, and when it completed.

By default, the server does not log internal LDAP operations corresponding to HTTP requests. To match HTTP client operations to internal LDAP operations:

  1. Prevent the server from suppressing log messages for internal operations.

    Set suppress-internal-operations:false on the LDAP access log publisher.

  2. Match the request/connId field in the HTTP access log with the same field in the LDAP access log.

Client access

To help diagnose client errors due to access permissions, see Effective rights.

Simple paged results

For some versions of Linux, you see a message in the DS access logs such as the following:

The request control with Object Identifier (OID) "1.2.840.113556.1.4.319"
cannot be used due to insufficient access rights

This message means clients are trying to use the simple paged results control without authenticating. By default, a global ACI allows only authenticated users to use the control.

To grant anonymous (unauthenticated) user access to the control, add a global ACI for anonymous use of the simple paged results control:

$ dsconfig \
 set-access-control-handler-prop \
 --hostname localhost \
 --port 4444 \
 --bindDN uid=admin \
 --bindPassword "password" \
 --add global-aci:"(targetcontrol=\"SimplePagedResults\") \
 (version 3.0; acl \"Anonymous simple paged results access\"; allow(read) \
 userdn=\"ldap:///anyone\";)" \
 --usePkcs12TrustStore /path/to/opendj/config/keystore \
 --trustStorePassword:file /path/to/opendj/config/keystore.pin \
 --no-prompt

Replication problems

Replicas do not connect

If you set up servers with different deployment IDs, they cannot share encrypted data. By default, they also cannot trust each other’s secure connections. You may see messages like the following in the logs/replication log file:

msg=Replication server accepted a connection from /address:port
to local address /address:port but the SSL handshake failed.

Unless the servers use your own CA, make sure their keys are generated with the same deployment ID/password. Either set up the servers again with the same deployment ID, or see Replace deployment IDs.

Temporary delays

Replication can generally recover from conflicts and transient issues. Temporary delays are normal and expected while replicas converge, especially when the write load is heavy. This is a feature of eventual convergence, not a bug.

For more information, see Replication delay (LDAP).

Use the logs

Replication uses its own error log file, logs/replication. Error messages in the log file have category=SYNC.

The messages have the following form. The following example message is folded for readability:

...msg=Replication server accepted a connection from 10.10.0.10/10.10.0.10:52859
 to local address 0.0.0.0/0.0.0.0:8989 but the SSL handshake failed.
 This is probably benign, but may indicate a transient network outage
 or a misconfigured client application connecting to this replication server.
 The error was: Remote host closed connection during handshake

Stale data

DS servers maintain historical information to bring replicas up to date, and to resolve conflicts. To prevent historical information from growing without limit, servers purge historical information after a configurable delay (replication-purge-delay, default: 3 days). A replica can become irrevocably out of sync if you restore it from a backup that is older than the purge delay, or if you stop it for longer than the purge delay. If this happens, reinitialize the replica from a recent backup or from a server that is up to date.

Incorrect configuration

When replication is configured incorrectly, fixing the problem can involve adjustments on multiple servers. For example, adding or removing a bootstrap replication server means updating the bootstrap-replication-server settings in the synchronization provider configuration of other servers. (The settings can be hard-coded in the configuration, or read from the environment at startup time, as described in Property value substitution. In either case, changing them involves at least restarting the other servers.)

For details, see sections in Replication.

Support

Sometimes you cannot resolve a problem yourself, and must ask for help or technical support. In such cases, identify the problem and how you reproduce it, and the version where you see the problem:

$ status --offline --version

ForgeRock Directory Services 7.2.5-20240524201627-49403efab3a3556b93d8ee62f263ca29a7e752ef
Build <datestamp>

Be prepared to provide the following additional information:

  • The Java home set in config/java.properties.

  • Access and error logs showing what the server was doing when the problem started occurring.

  • A copy of the server configuration file, config/config.ldif, in use when the problem started occurring.

  • Other relevant logs or output, such as those from client applications experiencing the problem.

  • A description of the environment where the server is running, including system characteristics, hostnames, IP addresses, Java versions, storage characteristics, and network characteristics. This helps to understand the logs, and other information.

  • The .zip file generated using the supportextract command.

    For an example showing how to use the command, see supportextract.