Troubleshooting resources in the operating system
The underlying operating system also provides a significant amount of information that can help diagnose issues that impact the performance and stability of PingAuthorize Server.
In some cases, problems with the underlying system can be directly responsible for issues seen with the server, and in others, system tools can help narrow down the cause of the problem.
Identifying problems with the underlying system
If the underlying system itself is experiencing problems, it can adversely impact the function of applications running on it. To look for problems in the underlying system, review the system log file (/var/log/messages
on Linux). Information about faulted or degraded devices or other unusual system conditions are written there.
Examining CPU utilization
Observing CPU utilization for the server process and the system as a whole provides clues as to the nature of the problem.
System-Wide CPU utilization
To investigate CPU consumption of the system as a whole, use the vmstat
command with a time interval in seconds. For example:
vmstat 5
The specific output of this command varies between different operating systems, but it includes the percentage of the time the CPU was spent executing user-space code (user time), the percentage of time spent executing kernel-space code (system time), and the percentage of time not executing any code (idle time).
-
If the CPUs are spending most of their time executing user-space code, the available processors are being well-utilized.
-
If performance is poor or the server is unresponsive, it can indicate that the server is not optimally tuned. If there is a high system time, it can indicate that the system is performing excessive disk and/or network I/O, or in some cases, there can be some other system-wide problem, like an interrupt storm.
-
If the system is mostly idle, but the server is performing poorly or is unresponsive, there can be a resource constraint elsewhere (for example, waiting on disk or memory access, or excessive lock contention), or the JVM can be performing other tasks, like stop-the-world garbage collection, that cannot be run heavily in parallel.
Per-CPU utilization
To investigate CPU consumption on a per-CPU basis, use the mpstat
command with a time interval in seconds. For example:
mpstat 5
On Linux systems, it might be necessary to add -P
ALL
to the command. For example:
mpstat -P ALL 5
Among other things, this command shows the percentage of time each CPU has spent in user time, system time, and idle time. If the overall CPU utilization is relatively low but mpstat
reports that one CPU has a much higher utilization than the others, there might be a significant bottleneck within the server, or the JVM might be performing certain types of garbage collection which cannot be run in parallel. On the other hand, if CPU utilization is relatively even across all CPUs, there is likely no such bottleneck, and the issue might be elsewhere.
Per-process utilization
To investigate CPU consumption on a per-process basis, use a command such as the top
utility on Linux. If a process other than the Java process used to run PingAuthorize Server is consuming a significant amount of available CPU, it might be interfering with the ability of the server to run effectively.
Examining disk utilization
If the underlying system has a very high disk utilization, it can adversely impact server performance. It could delay the ability to read or write database files or write log files. It could also raise concerns for server stability if excessive disk I/O inhibits the ability of the cleaner threads to keep the database size under control.
The iostat
tool can be used to obtain information about the disk activity on the system.
On Linux systems, iostat
should be invoked with the -x
argument. For example:
iostat -x 5
Several different types of information will be displayed, but to obtain an initial feel for how busy the underlying disks are, look at the %util column on Linux. This field shows the percentage of time that the underlying disks are actively servicing I/O requests. A system with a high disk utilization likely exhibits poor server performance.
If the high disk utilization is on one or more disks that are used to provide swap space for the system, the system might not have enough free memory to process requests. As a result, it might have started swapping blocks of memory that have not been used recently to disk. This can cause very poor server performance. It is important to ensure that the server is configured appropriately to avoid this condition.
If this problem occurs on a regular basis, then the server is likely configured to use too much memory. If swapping is not normally a problem, but it does arise, then check to see if there are any other processes running that are consuming a significant amount of memory, and check for other potential causes of significant memory consumption (for example, large files in a tmpfs
file system).
Examining process details
There are a number of tools provided by the operating system that can help examine a process in detail.
ps
The standard ps
tool can be used to provide a range of information about a particular process. For example, the command can be used to display the state of the process, the name of the user running the process, its process ID and parent process ID, the priority and nice value, resident and virtual memory sizes, the start time, the execution time, and the process name with arguments. For example:
ps -fly -p <processID>
Note that for a process with a large number of arguments, the standard ps
command displays only a limited set of the arguments based on available space in the terminal window.
pstack
The pstack
command can be used to obtain a native stack trace of all threads in a process. While a native stack trace might not be as user-friendly as a Java stack trace obtained using jstack
, it includes threads that are not available in a Java stack trace. For example, the command displays those threads used to perform garbage collection and other housekeeping tasks. The general usage for the pstack
command is:
pstack <processID>
dbx / gdb
A process debugger provides the ability to examine a process in detail. Like pstack
, a debugger can obtain a stack trace for all threads in the process, but it also provides the ability to examine a process (or core file) in much greater detail, including observing the contents of memory at a specified address and the values of CPU registers in different frames of execution. The GNU debugger gdb
is widely-used on Linux systems.
Using a debugger against a live process interrupts that process and suspends its execution until it detaches from the process. In addition, when running against a live process, a debugger has the ability to actually alter the contents of the memory associated with that process, which can have adverse effects. As a result, it is recommended that the use of a process debugger be restricted to core files and only used to examine live processes under the direction of your authorized support provider. |
pfiles / lsof
To examine the set of files that a process is using (including special types of files, like sockets), you can use a tool such as lsof
on Linux systems. For example:
lsof -p <processID>
Tracing process execution
If a process is unresponsive but is consuming a nontrivial amount of CPU time, or if a process is consuming significantly more CPU time than is expected, it might be useful to examine the activity of that process in more detail than can be obtained using a point-in-time snapshot.
For example, if a process is performing a significant amount of disk reads or writes, it can be useful to see which files are being accessed. Similarly, if a process is consistently exiting abnormally, starting a trace for that process just before it exits can help provide additional information that cannot be captured in a core file (and if the process is exiting rather than being terminated for an illegal operation, then no core file may be available).
To perform this trace on Linux, use the strace
tool. For example:
strace -f -p <processID>
Consult the strace
manual page for additional information.
Problems with SSL communication
Enable TLS debugging in the server to troubleshoot SSL communication issues. For example:
$ dsconfig create-debug-target \
--publisher-name "File-Based Debug Logger" \
--target-name com.unboundid.directory.server.extensions.TLSConnectionSecurityProvider \
--set debug-level:verbose \
--set include-throwable-cause:true
$ dsconfig set-log-publisher-prop \
--publisher-name "File-Based Debug Logger" \
--set enabled:true \
--set default-debug-level:disabled
In the java.properties
file, add -Djavax.net.debug=ssl
to the start-server
line, and run bin/dsjavaproperties
to make the option take effect on a scheduled server restart.
Examining network communication
Because the server is a network-based application, it can be valuable to observe the network communication that it has with clients. The server itself can provide details about its interaction with clients by enabling debugging for the protocol or data debug categories, but there can be a number of cases in which it is useful to view information at a much lower level. A network sniffer, like the tcpdump
tool on Linux, can be used to accomplish this.
There are many options that can be used with these tools, and their corresponding manual pages provide a more thorough explanation of their use. However, to perform basic tracing to show the full details of the packets received, for example, on port 389 with remote host 1.2.3.4, the following command can be used on Linux:
tcpdump -i <interface> -n -XX -s 0 host 1.2.3.4 and port 389
It does not appear that the tcpdump
tool provides support for LDAP parsing. However, it is possible to write capture data to a file rather than displaying information on the terminal (using -w
<path> with tcpdump
), so that information can be later analyzed with a graphical tool like Wireshark, which provides the ability to interpret LDAP communication on any port.
Enabling network tracing generally requires privileges that are not available to normal users and therefore can require root access. |