General troubleshooting methodology
When you detect a problem, use the following general methodology to isolate the problem.
-
Run the
bin/status
tool or look at the server status in the Administrative Console. Thestatus
tool provides a summary of the server’s current state with key metrics and a list of recent alerts. -
Look in the server logs. In particular, view the following logs:
-
logs/errors
-
logs/failed-ops
-
logs/expensive-ops
-
-
Use system commands such as
vmstat
andiostat
to determine if the server is bottlenecked on a system resource like CPU or disk throughput. -
For server performance issues (especially intermittent ones like spikes in response time), enable the
periodic-stats-logger
to help isolate problems, because it stores important server performance information on a per-second basis. Theperiodic-stats-logger
can save the information in a.csv
file that can be loaded into a spreadsheet.The information this logger makes available is very configurable. You can create multiple loggers for different types of information or a different frequency of logging (for example, hourly data in addition to per-second data). For more information, see Profiling server performance using the Stats Logger.
-
For more advanced users, run the
collect-support-data
tool on the system, unzip the archive, and look through the collected information. This is often useful when administrators most familiar with the data platform do not have direct access to the systems where the production servers are running. They can examine thecollect-support-data
archive on a different server. For more information, see Working with the collect-support-data tool.
Run the |