General troubleshooting methodology
When a problem is detected, the following general methodology to isolate the problem are recommended.
-
Run the
bin/status
tool or look at the server status in the Administrative Console. Thestatus
tool provides a summary of the server’s current state with key metrics and a list of recent alerts. -
Look in the server logs. In particular, view the following logs:
-
logs/errors
-
logs/failed-ops
-
logs/expensive-ops
-
-
Use system commands, such as
vmstat
andiostat
to determine if the server is bottle-necked on a system resource like CPU or disk throughput. -
For performance problem (especially intermittent ones like spikes in response time), enabling the
periodic-stats-logger
can help to isolate problems, because it stores important server performance information on a per-second basis. Theperiodic-stats-logger
can save the information in a csv-formatted file that can be loaded into a spreadsheet. The information this logger makes available is very configurable. You can create multiple loggers for different types of information or a different frequency of logging (for example, hourly data in addition to per-second data). For more information, see "Profiling Server Performance Using the Periodic Stats Logger". -
For replication problem, run
dsreplication status
and look at thelogs/replication
file. -
For more advanced users, run the
collect-support-data
tool on the system, unzip the archive somewhere, and look through the collected information. This is often useful when administrators most familiar with the Data Platform do not have direct access to the systems where the production servers are running. They can examine thecollect-support-data
archive on a different server. For more information, see Using the Collect Support Data Tool.
Run the |