1. Run the bin/status tool or look at the server status in the Administrative Console. The status tool provides a summary of the server’s current state with key metrics and a list of recent alerts.
  2. Look in the server logs. In particular, view the following logs:
    • logs/errors
    • logs/failed-ops
    • logs/expensive-ops
  3. Use system commands such as vmstat and iostat to determine if the server is bottlenecked on a system resource like CPU or disk throughput.
  4. For server performance issues (especially intermittent ones like spikes in response time), enable the periodic-stats-logger to help isolate problems, because it stores important server performance information on a per-second basis. The periodic-stats-logger can save the information in a .csv file that can be loaded into a spreadsheet.

    The information this logger makes available is very configurable. You can create multiple loggers for different types of information or a different frequency of logging (for example, hourly data in addition to per-second data). For more information, see Profiling server performance using the Stats Logger.

  5. For more advanced users, run the collect-support-data tool on the system, unzip the archive, and look through the collected information. This is often useful when administrators most familiar with the data platform do not have direct access to the systems where the production servers are running. They can examine the collect-support-data archive on a different server. For more information, see Working with the collect-support-data tool.

Run the collect-support-data tool whenever you can't easily identify the cause of a problem. You can pass this information to your authorized support provider for assistance in identifying and addressing the root cause of the issue.