PingDirectory

Troubleshooting installation and maintenance issues

The following topics include common installation and maintenance issues and possible solutions.

The setup program will not run

If the setup tool does not run properly, reference a list of common reasons and their solutions.

If the setup tool does not run properly, some of the most common reasons include:

A Java environment is not available

The server requires that Java be installed on the system before running the setup tool.

If there are multiple instances of Java on the server, run the setup tool with an explicitly-defined value for the JAVA_HOME environment variable that specifies the path to the Java installation. For example:

$ env JAVA_HOME=/ds/java ./setup

Another issue might be that the value specified in the provided JAVA_HOME environment variable can be overridden by another environment variable. If that occurs, use the following command to override any other environment variables:

$ env UNBOUNDID_JAVA_HOME="/ds/java" UNBOUNDID_JAVA_BIN="" ./setup

Unexpected arguments provided to the JVM

If the setup tool attempts to launch the Java command with an invalid set of arguments, it can prevent the Java virtual machine (JVM) from starting. By default, no special options are provided to the JVM when running setup, but this might not be the case if either the JAVA_ARGS or UNBOUNDID_JAVA_ARGS environment variable is set. If the setup tool displays an error message that indicates that the Java environment could not be started with the provided set of arguments, run the following command:

$ unset JAVA_ARGS UNBOUNDID_JAVA_ARGS

The server is already configured or started

The setup tool is only intended to provide the initial configuration for the server. It will not run if it detects that it has already been run.

A previous installation should be removed before installing a new one. However, if there is nothing of value in the existing installation, the following steps can be used to run the setup program:

  1. Remove the config/config.ldif file and replace it with the config/update/config.ldif.{revision} file containing the initial configuration.

  2. If there are any files or subdirectories in the db directory, then remove them.

  3. If a config/java.properties file exists, then remove it.

  4. If a lib/setup-java-home script (or lib\set-java-home.bat file on Microsoft Windows) exists, then remove it.

The server will not start

There are various reasons why the server will not start.

If the server does not start, then there are several potential causes.

The server or other administrative tool is already running

Only a single instance of the server can run at any time from the same installation root. Other administrative operations can prevent the server from being started. In such cases, the attempt to start the server should fail with a message like:

The <server> could not acquire an exclusive lock on file
/ds/PingData<server>/locks/server.lock:
The exclusive lock requested for file
/ds/PingData<server>/locks/ server.lock
was not granted, which indicates that another
process already holds a shared or exclusive lock on
that file. This generally means that another instance
of this server is already running.

If the server is not running (and is not in the process of starting up or shutting down), and there are no other tools running that could prevent the server from being started, it is possible that a previously-held lock was not properly released. Try removing all of the files in the locks directory before attempting to start the server.

There is not enough memory available

When the server is started, the Java virtual machine (JVM) attempts to allocate all memory that it has been configured to use. If there is not enough free memory available on the system, the server generates an error message indicating that it could not be started.

There are several potential causes for this:

  • If the amount of memory in the underlying system has changed, the server might need to be re-configured to use a smaller amount of memory.

  • Another process on the system is consuming memory and there is not enough memory to start the server. Either terminate the other process, or reconfigure the server to use a smaller amount of memory.

  • The server just shut down and an attempt was made to immediately restart it. If the server is configured to use a significant amount of memory, it can take a few seconds for all of the memory to be released back to the operating system. Run the vmstat command and wait until the amount of free memory stops growing before restarting the server.

  • If the system is configured with one or more memory-backed file systems (such as /tmp), determine if any large files are consuming a significant amount of memory. If so, remove them or relocate them to a disk-based filesystem.

An invalid Java environment or JVM option was used

If an attempt to start the server fails with 'no valid Java environment could be found,' or 'the Java environment could not be started,' and memory is not the cause, other causes can include the following:

  • The Java installation that was previously used to run the server no longer exists. Update the config/java.properties file to reference the new Java installation and run the bin/dsjavaproperties command to apply that change.

  • The Java installation has been updated, and one or more of the options that had worked with the previous Java version no longer work. Re-configure the server to use the previous Java version, and investigate which options should be used with the new installation.

  • If an UNBOUNDID_JAVA_HOME or UNBOUNDID_JAVA_BIN environment variable is set, its value can override the path to the Java installation used to run the server (defined in the config/java.properties file). Similarly, if an UNBOUNDID_JAVA_ARGS environment variable is set, then its value might override the arguments provided to the JVM. If this is the case, explicitly unset the UNBOUNDID_JAVA_HOME, UNBOUNDID_JAVA_BIN, and UNBOUNDID_JAVA_ARGS environment variables before starting the server.

Any time the config/java.properties file is updated, the bin/dsjavaproperties tool must be run to apply the new configuration. If a problem with the previous Java configuration prevents the bin/dsjavaproperties tool from running properly, remove the lib/set-java-home script (or lib\set-java-home.bat file on Microsoft Windows) and invoke the bin/dsjavaproperties tool with an explicitly-defined path to the Java environment, such as:

$ env UNBOUNDID_JAVA_HOME=/ds/java bin/dsjavaproperties

An invalid command-line option was used

There are a small number of arguments that can be provided when running the bin/start-server command. If arguments were provided and are not valid, the server displays an error message. Correct or remove the invalid argument and try to start the server again.

The server has an invalid configuration

If a change is made to the server configuration using dsconfig or the Administrative Console, the server will validate the change before applying it. However, it is possible that a configuration change can appear to be valid, but does not work as expected when the server is restarted.

In most cases, the server displays (and writes to the error log) a message that explains the problem. If the message does not provide enough information to identify the problem, the logs/config-audit.log file provides recent configuration changes, or the config/archivedconfigs directory contains configuration changes not made through a supported configuration interface. The server can be started with the last valid configuration using the --useLastKnownGoodConfig option:

$ bin/start-server --useLastKnownGoodConfig

To determine the set of configuration changes made to the server since the installation, use the config-diff tool with the arguments --sourceLocal --targetLocal --sourceBaseline. The dsconfig --offline command can be used to make configuration changes.

Proper permissions are missing

The server should only be started by the user or role used to initially install the server. However, if the server was initially installed as a non-root user and then started by the root account, the server can no longer be started as a non-root user. Any new files that are created are owned by root.

If the user account used to run the server needs to change, change ownership of all files in the installation to that new user. For example, if the server should be run as the "ds" user in the "other" group, run the following command as root:

$ chown -R ds:other /ds/PingData<server>

The server has shut down

There are several reasons why the server can shut down.

Check the current server state by using the bin/server-state command. If the server was previously running but is no longer active, potential reasons can include the following.

  • Shut down by an administrator – Unless the server was forcefully terminated, then messages are written to the error and server logs stating the reason.

  • Shut down when the underlying system crashed or was rebooted – Run the uptime command on the underlying system to determine what was recently started or stopped.

  • Process terminated by the underlying operating system – If this happens, a message is written to the system error log.

  • Shut down in response to a serious problem – This can occur if the server has detected that the amount of usable disk space is critically low, or if errors have been encountered during processing that left the server without worker threads. Messages are written to the error and server logs (if disk space is available).

  • Java virtual machine (JVM) has crashed – If this happens, then the JVM should provide a fatal error log (a hs_err_pid<processID>.log file), and potentially a core file.

The server will not accept client connections

If the server is not accepting connections, there are several reasons this can happen.

Check the current server state by running the bin/server-state command. If the server does not appear to be accepting connections from clients, reasons can include the following:

  • The server is not running.

  • The underlying system on which the server is installed is not running.

  • The server is running, but is not reachable as a result of a network or firewall configuration problem. If that is the case, connection attempts should time out rather than be rejected.

  • If the server is configured to allow secure communication through SSL or StartTLS, a problem with the key manager or trust manager configuration can cause connection rejections. Messages are written to the server access log for each failed connection attempt.

  • The server might have reached its maximum number of allowed connections. Messages should be written to the server access log for each rejected connection attempt.

  • If the server is configured to restrict access based on the address of the client, messages should be written to the server access log for each rejected connection attempt.

  • If a connection handler encounters a significant error, it can stop listening for new requests. A message should be written to the server error log with information about the problem. Restarting the server can also solve the issue. Another option is to create an LDIF file that disables and then re-enables the connection handler, create the config/auto-process-ldif directory if it does not already exist, and then copy the LDIF file into it.

The server is unresponsive

Check to see if the server is responsive with the bin/server-state command. If the server is not responding, there are various reasons and solutions to diagnose and fix this issue.

Check the current server state by using the bin/server-state command. If the server process is running and appears to be accepting connections but does not respond to requests received on those connections, potential reasons for this can include the following.

  • If all worker threads are busy processing other client requests, new requests are forced to wait until a worker thread becomes available. A stack trace can be obtained using the jstack command to show the state of the worker threads and the waiting requests.

    If all worker threads are processing the same requests for a long time, the server sends an alert that it might be deadlocked. All threads might be tied up processing unindexed searches.

  • If a request handler is busy with a client connection, other requests sent through that request handler are forced to wait until it is able to read data. If there is only one request handler, all connections are impacted. Stack traces obtained using the jstack command will show that a request handler thread is continuously blocked.

  • If the Java virtual machine (JVM) in which the server is running is not properly configured, it can spend too much time performing garbage collection. The effect on the server is similar to that of a network or firewall configuration problem. A stack trace obtained with the pstack utility will show that most threads are idle except the one performing garbage collection. It is also likely that a small number of CPUs is 100% busy while all other CPUs are idle. The server will also issue an alert after detecting a long JVM pause that will include details.

  • If the JVM in which the server is running has hung, the pstack utility should show that one or more threads are blocked and unable to make progress. In such cases, the system CPUs should be mostly idle.

  • If a there is a network or firewall configuration problem, communication attempts with the server will fail. A network sniffer will show that packets sent to the system are not receiving TCP acknowledgment.

  • If the host system is hung or lost power with a graceful shutdown, the server will be unresponsive.

If it appears that the problem is with the server software or the JVM, work with a support provider to diagnose the problem and potential solutions.

Problems with the administrative console

If you have problems working with the administrative console, there are common reasons why.

If a problem occurs when trying to use the administrative console, reasons might include one of the following:

  • The web application container that hosts the console is not running. If an error occurs while trying to start it, consult the logs for the web application container.

  • If a problem occurs while trying to authenticate, make sure that the target server is online. If it is, the access log might provide information about the authentication failure.

  • If a problem occurs while interacting with the server instance using the administrative console, the access and error logs for that instance can provide additional information.

Troubleshooting problems with SSL communication

Enable TLS debugging and restart the server to save your changes.

Steps

  1. Enable TLS debugging in the server to troubleshoot SSL communication issues:

    Example:

    $ dsconfig create-debug-target \
      --publisher-name "File-Based Debug Logger" \
      --target-name
    com.unboundid.directory.server.extensions.TLSConnectionSecurityProvider \
      --set debug-level:verbose \
      --set include-throwable-cause:true
    $ dsconfig set-log-publisher-prop \
      --publisher-name "File-Based Debug Logger" \
      --set enabled:true \
      --set default-debug-level:disabled
  2. Make the option take effect on a scheduled server restart.

    1. In the java.properties file, add -Djavax.net.debug=ssl to the start-ds line.

    2. To make the option take effect on a scheduled server restart, run bin/dsjavaproperties.