General troubleshooting methodology
When a problem is detected, the following general methodology to isolate the problem are recommended.
-
Run the
bin/statustool or look at the server status in the admin console. Thestatustool provides a summary of the server’s current state with key metrics and a list of recent alerts. -
Look in the server logs. In particular, view the following logs:
-
logs/errors
-
logs/failed-ops
-
logs/expensive-ops
-
-
Use system commands, such as
vmstatandiostatto determine if the server is bottle-necked on a system resource like CPU or disk throughput. -
For performance problem (especially intermittent ones like spikes in response time), enabling the
periodic-stats-loggercan help to isolate problems, because it stores important server performance information on a per-second basis. Theperiodic-stats-loggercan save the information in a csv-formatted file that can be loaded into a spreadsheet. The information this logger makes available is very configurable. You can create multiple loggers for different types of information or a different frequency of logging (for example, hourly data in addition to per-second data). For more information, see "Profiling Server Performance Using the Periodic Stats Logger". -
For replication problem, run
dsreplication statusand look at thelogs/replicationfile. -
For more advanced users, run the
collect-support-datatool on the system, unzip the archive somewhere, and look through the collected information. This is often useful when administrators most familiar with the Data Platform do not have direct access to the systems where the production servers are running. They can examine thecollect-support-dataarchive on a different server. For more information, see Using the Collect Support Data Tool.
|
Run the |