When a problem is detected, the following general methodology to isolate the problem are recommended.

  1. Run the bin/status tool or look at the server status in the Administrative Console. The status tool provides a summary of the server’s current state with key metrics and a list of recent alerts.
  2. Look in the server logs. In particular, view the following logs:
    • logs/errors
    • logs/failed-ops
    • logs/expensive-ops
  3. Use system commands, such as vmstat and iostat to determine if the server is bottle-necked on a system resource like CPU or disk throughput.
  4. For performance problem (especially intermittent ones like spikes in response time), enabling the periodic-stats-logger can help to isolate problems, because it stores important server performance information on a per-second basis. The periodic-stats-logger can save the information in a csv-formatted file that can be loaded into a spreadsheet. The information this logger makes available is very configurable. You can create multiple loggers for different types of information or a different frequency of logging (for example, hourly data in addition to per-second data). For more information, see "Profiling Server Performance Using the Periodic Stats Logger".
  5. For replication problem, run dsreplication status and look at the logs/replication file.
  6. For more advanced users, run the collect-support-data tool on the system, unzip the archive somewhere, and look through the collected information. This is often useful when administrators most familiar with the Data Platform do not have direct access to the systems where the production servers are running. They can examine the collect-support-data archive on a different server. For more information, see Using the Collect Support Data Tool.
Important:

Run the collect-support-data tool whenever there is a problem whose cause is not easily identified, so that this information can be passed back to your authorized support provider before corrective action can be taken.