The server is slow to respond to client requests
If the server is running and does respond to clients, but clients take a long time to receive responses, then the problem can be attributable to a number of potential problems.
In these cases, use the Periodic Stats Logger, which is a valuable tool to get per-second monitoring information on the server. The Periodic Stats Logger can save the information in |
The potential problems that cause slow responses to client requests are as follows:
- The server is not optimally configured for the type of requests being processed, or clients are requesting inefficient operations.
-
If this is the case, then the access log should show that operations are taking a long time to complete and they will likely be unindexed. Updating the server configuration to better suit the requests or altering the requests to make them more efficient could help alleviate the problem.
Review the expensive operations access log in
logs/expensive-ops
, which by default logs operations that take longer than 1 second. You can also run thebin/status
command or view the status in the Administrative Console to see the server’sWork Queue
information (also see the following case). - The server is overwhelmed with client requests and has amassed a large backlog of requests in the work queue.
-
This can be the result of a configuration problem (for example, too few worker threads configured), or it can be necessary to provision more systems on which to run the server software. Symptoms of this problem appear similar to those experienced when the server is asked to process inefficient requests, but looking at the details of the requests in the access log show that they are not necessarily inefficient requests.
Run the
bin/status
command to view theWork Queue
information. If everything is performing well, you should not see a large queue size or a server that is near 100% busy. The% Busy
statistic is calculated as the percentage of worker threads that are busy processing operations. For example:--- Work Queue --- : Recent : Average : Maximum -----------:--------:---------:-------- Queue Size : 10 : 1 : 10 % Busy : 17 : 14 : 100
You can also view the expensive operations access log in
logs/expensive-ops
, which by default logs operations that take longer than 1 second. - The server is not configured to fully cache all of the data in the server, or the cache is not yet primed.
-
In this case,
iostat
reports a very high disk utilization. This can be resolved by configuring the server to fully cache all data and to load database contents into memory on startup. If the underlying system does not have enough memory to fully cache the entire data set, then it might not be possible to achieve optimal performance for operations that need data which is not contained in the cache. For more information, see Tuning for disk-bound deployments. - If the JVM is not properly configured, then it will need to perform frequent garbage collection and periodically pause execution of the Java code that it is running.
-
In that case, the server error log should report that the server has detected a number of pauses and can include tuning recommendations to help alleviate the problem.
- If the server is configured to use a large percentage of the memory in the system, then it is possible that the system has gotten low on available memory and has begun swapping.
-
In this case,
iostat
should report very high utilization for disks used to hold swap space, and commands likecat /proc/meminfo
on Linux can report a large amount of swap memory in use. Another cause of swapping is if swappiness is not set to0
on Linux. For more information, see Disabling file system swapping. - If another process on the system is consuming a significant amount of CPU time, then it can adversely impact the ability of the server to process requests efficiently.
-
Isolating the processes (for example, using processor sets) or separating them onto different systems can help eliminate this problem.