Use Cases

Best Practices: Performance Testing for PingFederate

This document provides an overview as well as general guidelines related to performance testing methodology for testing a PingFederate server prior to that system entering a customer production environment.

Many PingFederate deployments are still rolled out into production without any performance and scalability testing. This document shows you how Ping Professional Services approaches performance testing and can assist in identifying and thinking through potential bottlenecks so that you can deploy your servers with confidence.

Audience

This is intended as a starting point for anyone interested in learning or expanding their knowledge of PingFederate performance testing. Performance testing should always be undertaken with a consistent methodology and well-known tooling. This article will not make you a seasoned performance tester because that can take many years of experience. Instead, the basic skills explained here will give you some familiarity with the testing techniques and approaches Ping Identity takes when approaching such an important topic.

Why do performance testing?

Low-performing PingFederate applications do not deliver their intended benefits to an enterprise. Slow authentication performance or minimal scalability result in a net loss of time and money. If PingFederate is not delivering in a highly performant manner, this reflects poorly on all of the architects and consultants that delivered the solution to the customer.

Lesson 1: Load generation

Before you embark on your performance testing journey, you must ensure there is sufficient hardware for load/traffic generation.

A common misconception of performance testing efforts is that a load generator is a piece of software that can generate massive amounts of load on very little hardware. Always keep in mind that the load generator must have sufficient access to hardware in order to generate the load required in order to test an enterprise-scale system.

A customer could purchase a load generator such as Loadrunner and incorrectly assume that they can install it on a couple of modestly-sized PingFederate instance and expect it to generate thousands of concurrent users.

Remember, load generators are software, and they should be treated as such. All software is bound by resource use and design. Everything has performance limitations.

For anyone doing performance testing, size your client hardware like you would your target system. For testing PingFederate, the testing hardware should almost always overpower the PingFederate servers because PingFederate transactions are very fast. If your load testing system is not able to generate requests at a sufficiently high rate, you can mistakenly assume that you’ve hit the peak of PingFederate’s capability, when in reality your load generator is not able to sufficiently stress PingFederate.

Make sure that if you or your customer is starting a performance and load testing endeavor, you are using adequate testing hardware.

Use a large system to test a small system. Use a 12-core load generator to performance test a quad-core PingFederate system. Make sure your load generator has at least two to three times the performance of the test system. You need power to test power. The faster the response times are for your load tests, the more powerful the hardware that you need to pump through those requests.

Ask yourself:

  • Are you connected to a network segment that can handle that load/amount of network traffic?

  • What about proxy servers that can introduce delay and have scalability issues themselves? Could that be interfering with your test?

Lesson 2: Results validation

Don’t use intrusive validation in your test cases. For example, when you have a test case, you want to make sure that the validation of the actual result is as lightweight as possible. When you validate the result, you want to make sure that you pick the simplest and fastest way to do so.

For example, if you get a sign on page back from a request and you want to validate whether the sign on page is there, don’t look through the entire body of the HTML to make sure it’s exactly the same as the previous one that came in. Instead, look for specific key information in the request that comes back. If you are testing identity provider (IdP)-initiated single sign-on (SSO), use a simple, lightweight adapter on the service provider (SP) side that returns a simple, small, static HTML page.

If you have too heavy of a validation approach, it slows you down overall. This in turn means you might not really understand where the bottleneck is located.

Lesson 3: Warm up your server

As far as the backend goes, tune and warm up the system. When you deploy PingFederate out-of-the-box on a server with 8 GB RAM/multi-core system, understand it is not fully tuned. Make sure that you go through some sort of tuning process so that PingFederate can use the resources that are available to it.

Don’t test a cold system because cold systems don’t yield typical results. Don’t just restart a PingFederate server and immediately hit it with load, as this is not a valid approach. Java by nature improves over time with the just-in-time (JIT) compilation. Hot spots in the code are compiled rather than interpreted after a certain amount of time and if they are really hot, they can be inlined. You want to make sure that you warm the system up.

A server is typically running all the time, and it is reasonable to say that JIT compilation will have occurred. Because you want to make sure that you test a system that would otherwise be in that state, warm it up before you test it.

Lesson 4: Things to keep in mind when monitoring performance

Performance monitoring or resource monitoring is an act of non-intrusively collecting or observing performance data from an operating or running application. As with any Java application, enterprise applications are affected by garbage collection performance.

In contrast, performance profiling is an act of collecting performance data from an operating or running application that might be intrusive on application throughput or responsiveness.

Profiling is rarely done in production environments. You want to avoid the use of profiling system, as they will affect your test.

Do not use excessive logging or harsh monitoring tools that affect the overall performance of the product. Just attaching JConsole monitoring to the system can affect runtime performance and give you results that are not usable because the system is spending part of its time responding to requests from the monitoring tools. Doing this results in additional load on the system that you are not yet accounting for.

CPU

For an application to reach its highest performance or scalability, it needs to not only take full advantage of the CPU cycles available to it, but also to use them in a non-wasteful manner. Making efficient use of CPU cycles can be challenging for multithreaded applications running on multiprocessor and multicore systems. Additionally, it is important to note that an application that can saturate CPU resources does not necessarily imply it has reached its maximum performance or scalability. To identify how an application is utilizing CPU cycles, monitor CPU utilization at the operating system level with tools such as perfmon or typeperf (command-line tool) or System Manager, with top being the command-line example.

Linux has vmstat, which shows combined CPU utilization across all virtual processors. Vmstat can optionally take a reporting interval, in seconds, as a command-line argument. If no reporting interval is given to vmstat, the reported output is a summary of all CPU use data collected since the system has last been booted. When a reporting interval is specified, the first row of statistics is a summary of all data collected since the system was last booted.

Disk

PingFederate disk operations/disk I/O should be monitored for possible performance issues. Application logs write important information about the state or behavior of the application as various events occur. Disk I/O utilization is the most useful monitoring statistic for understanding application disk usage because it is a measure of active disk I/O time. Disk I/O utilization along with system or kernel CPU utilization can be monitored using iostat.

When monitoring applications for an extended period of time, such as several hours or days, or in a production environment, many performance engineers and system administrators of Linux systems use sar to collect performance statistics. With sar, you can select which data to collect, such as user CPU utilization, system or kernel CPU utilization, number of system calls, memory paging, and disk I/O statistics. Data collected from sar is usually looked at after the fact, as opposed to while it is being collected.

Observing data collected over a longer period of time can help identify trends that may provide early indications of pending performance concerns. You can find additional information on what performance data can be collected and reported with sar in the Linux sar man pages.

In summary, just remember the “observer effect” and that seeing is changing.

Lesson 5: Did the test return expected results?

Do you get the expected results when running your tests? A performance test is based on a functional test and therefore must perform functionally first. The key metrics for performance and reliability result analysis are response time and throughput. Generally speaking, these two metrics will always be the most important, and they are offset by resource use.

You want to look at:

  1. What is the response time for a given request?

  2. How much data can you push through the system?

  3. How many requests can you process of a given type?

After you have this information, some pass and fail criteria can be handed down from product managers. If product managers have specific criteria that must be met for those response time targets and resource use targets, those must be taken into account by the performance tester, and those are usually the ones that have to be met before you can consider something to be ready for production.

Lesson 6: Tools of the trade

A useful tool that is HTTPS based is Apache JMeter, which you can use to send HTTPS requests to any given PingFederate server.

Tune JMeter for optimal performance where appropriate. JMeter is just software and is subject to the same resource constraints as any application. The PingFederate Capacity Planning Guide has recommendations on this, so be sure to review it.

Lesson 7: Understand and tune the infrastructure

PingFederate is only one part of the data center infrastructure, and it has a great deal of reliance on other systems performing properly.

Ensure PingFederate is the only application running on the test systems, or at least be aware of the fact that other applications might be running on the PingFederate server. Network latency between test systems can affect results, so ensure that network infrastructure is robust, and don’t take for granted bandwidth or latency. Be aware of any firewalls or proxy servers because these can cause issues with data transfer latencies between systems. Before conducting performance tests, make sure that you have a uniform configuration across all PingFederate servers. A non-uniform configuration often manifests itself as a single cluster member that has higher CPU use and a higher latency because it is garbage collecting more frequently.

Lastly, don’t forget disk latency. Make sure that if you must log information for audit purposes or other reasons, you are writing to a fast disk.

Summary

This document has given general guidelines related to performance testing methodologies for testing a PingFederate service prior to running in a real production environment. It’s shown some of the common reasons why failure to do effective performance testing can lead to a system that does not perform up to expectations.

Our next installment of this series will move into a discussion of how to set up an actual performance test and give some common examples of scripts and tools used for that.