Policy Performance Optimization

How a policy is constructed plays a large role in performance. A prime example is a policy during development vs. the same one deployed into production. You may implement verbose logging and extensive auditing during development to help you troubleshoot. But in production, such logging can obscure the real error and make it more difficult to determine the root cause of issues. Also, extensive auditing will impact Gateway performance and consume disk space rapidly.   
How a policy is constructed plays a large role in performance. A prime example is a policy during development vs. the same one deployed into production. You may implement verbose logging and extensive auditing during development to help you troubleshoot. But in production, such logging can obscure the real error and make it more difficult to determine the root cause of issues. Also, extensive auditing will impact Gateway performance and consume disk space rapidly.   
Performance optimization means accomplishing a task in the most efficient way. Often, this means focusing on reducing latency:
  • Latency slows down the individual API calls, providing slower individual responses to client software.
  • Policies with high latency limits throughput, as this causes message processing threads to remain open until the thread completes. Sometimes more concurrency can alleviate this, but other times additional concurrency only increases CPU load with more task switching overhead.
Latency has two costs:
    Wait time:
     The Gateway idles while waiting for external resources to complete their work.
    Local CPU usage:
     If a policy's latency is caused by an overtaxed CPU, then this limits the number of requests that the Gateway can process over time. External latency is often unavoidable but local CPU usage is often in the control of the policy author. 
Logging and Auditing
Auditing is Very Expensive
Use auditing sparingly, as it can impact Gateway performance, sometimes significantly. During the development/testing phase, detailed audits can help you identify and resolve problem points. However when the policy is moved to production, remove all unnecessary audits to maximize Gateway performance. CA Technologies strongly recommends against using the audit subsystem unless mandated by regulatory agencies.
Training material often recommend using the Add Audit Detail Assertion to help develop your policy. The default action of this assertion is to audit at the INFO level:
This is a useful tool for policy development, but is a detriment to production as the INFO level will generate many audits. You should delete all instances of the Audit Messages in Policy Assertion for production policies, except for noting errors and debug situations. They should never be used as part of normal policy flow. Ideally, a production policy should produce no audits at all during successful execution.
A more technical look at auditing:
Auditing requires additional writes to disk. In high concurrency production environments this introduces "external latency", as the Gateway waits for the audit subsystem to complete. This in turn prevents the working thread from being returned to the thread pool. A typical audit may take between 50 to 500 milliseconds to complete and each audit creates a multi-table database insert transaction to the database. Thus, the maximum insert rate of the database used for auditing effectively becomes the maximum auditing rate, which in turn is limited by the disk I/O performance. For standard hard drives, this is measure in hundreds per second; solid-state drives offer better performance but it is still measure in the thousands per second. In clustered environments, write rates become even more crucial as it limits cluster scaling to the cluster-wide database.
If auditing is unavoidable, audit to external databases as they may offer better performance and scaling compare to the on-device database. The Logging subsystem is preferred for audit-like actions.
Logging is Cheaper, But Not Free
In the Add Audit Details Properties shown above, you can opt to 
 transactions rather than audit them. This causes the Gateway to capture the data in a log file rather than initiating a database transaction. This is less "expensive" than auditing as it requires less overhead.
Logs lacks the structure and viewing user experience of our audit viewer, but they do not have the database transaction limitations. The Gateway still flushes to the hard disk at intervals, so the IOPS rating of the disk is still important but is much less limiting than audits.
It is important to understand the consequences of logging: If you log 30 messages per policy at a desired 1000 transactions per second, then you are attempting to write 30,000 messages to disk per second. If these are short messages, there are two effects:
  1. The disk space used.
  2. The "noise level" of these informational messages. 
There is no need to trace the operation of every successful request. During production debugging, all the successful requests make it difficult to find the useful error messages. You could refer to this as "Signal to noise ratio".
Large and frequent logging messages might also reach disk write bandwidth limitations. This is much less common but has occurred in several field deployments.
As with auditing, CA Technologies recommends that a production policy logs nothing at all in the successful case. This reduces the signal to noise ratio and does not waste disk space or time. If you need to log success cases, keep the logging data to a single line per successful request.
Back-End Latency
Back-end systems play a large part of the overall performance of an API and back-end latency is often the cause of perceived Gateway performance issues.
Latency vs. TPS vs. Concurrency
Transactions per second (TPS) is defined as the number of whole policy executions that can complete in one second. It is the inverse of latency. For example, if an API call (including network and Gateway time) to the back end takes 100 milliseconds to complete, then a single requestor could execute 10 of those calls in one second. Examining the metrics data, you discover:
  • The policy execution took 10 milliseconds
  • The network required another 10 milliseconds
  • Waiting for the back-end response took the remaining 80 milliseconds
If your expectation is that the API must be able to provide 1000 transactions per second, then 100 such requests must be performed in parallel to equal 1000 TPS. In reality, the back end may not sustain 80ms responses times–it could be longer. This increased back-end latency drives the need for even more concurrency to sustain the 1000 TPS expectations. Large back-end latency creates the perception that the API is slow and that the Gateway is the bottleneck. It is important to consider latency during any performance evaluation.
Network Performance
Network (Wide Area, Metropolitan Area) performance can also contribute to overall latency. Bandwidth could be a factor during certain time of day, especially when large message are involved. Or it could just be distance: round trip between Los Angeles and New York City even at the speed of light requires about 30ms.
CPU-Intensive Operations
As mentioned earlier, CPU usage can cause latency.  You can avoid this with careful choices around policy composition, understanding of the relative costs, and careful balancing of business requirements against the reality of scaling.
Per-Request SSL Session Initiation
Negotiating SSL sessions is very expensive.  For this reason, we recommend SSL Session reuse via either SSL Session affinity at the load balancer or other techniques.
Cryptographic Assertions
In general, cryptography is an expensive operation. Some use cases absolutely require it, but you can avoid the most expensive operations in most others.  
For example, validating a signature requires less overhead than creating one and signature validation is good security practice. In some token exchange use cases, you can avoid creating token signatures by caching outbound tokens on a per-identity basis. This reduces the overhead of signing or encrypting data. 
Create or Sign SAML Assertion
SAML signatures are an asymmetric cryptographic operation that demands high CPU processing requirements. As a result, they can incur significant latency and slow down policy execution. It is not uncommon to see delays of up to hundreds of milliseconds, depending on the algorithms involved. Use outbound token caching for improved performance.
Issue JSON Web Token
JSON Web Tokens are the direct equivalents of SAML in the JSON world, with exactly the same issues surrounding CPU consumption.
Data Transformation
Mode Switching
The Gateway specifically treats XML data using DOM parsing and reduces our XML overhead by preserving the DOM data structure for the duration of policy execution. For policies that perform XML manipulation using XSL or Schema Validation, or XPath, the Gateway attempts to avoid re-parsing data by operating on the preserved DOM data structure. This methodology helps maintain performance.
By comparison, you force the Gateway into needless mode switching by doing something like the following:
  1. You first manipulate the ${request} message (or other message type variables) with XML operators.
  2. Then you manipulate it with string operators such as the Set Context Variable Assertion.
  3. Later in the policy, you operate again on the same message data with XSL, XML Schema, or XPath manipulation or inspecition.
  4. All this causes the Gateway to re-parse the message data because regular expressions and other string operators cannot operate on the DOM structure, which forces it to modify the string representation.
  5. As a result, the Gateway must re-render the XML DOM data as a String, then re-parse it back into a DOM.
XSL Transforms
XSL transformations are less of a concern than cryptography, but they can still cost 9 ms per request for a 10 Kbyte message. This scales up with larger messages and puts additional pressure on the Gateway's "garbage collection" subsystem.
Regular Expressions
Though lighter weight in terms of memory consumption than XSL, regular expressions still can have CPU-related performance issues.
Some relatively common regex patterns are surprisingly more "expensive": In general, avoid back references (of the form '?=' and other similar patterns) as that causes much larger CPU usage since several standard optimizations are disabled by that construct.
JSON Transform
This is currently not quantified, but has the same heavy string data manipulation as regular expressions.
Caching can help in many situations where there is a bottleneck on a crucial resource.
You can leverage in-policy caching to help many external dependencies. Be aware that inbound pure round-robin caching can defeat certain types of caching, so you need to plan inbound load balancing carefully.
LDAP concurrency can occur in certain policies; the Query LDAP Assertion is often used for non-standard LDAP validation.
The built-in Query LDAP Assertion does not use the same cache, you can configure a number of LDAP cache settings from within the assertion's properties.
Caching can even help avoid CPU bottlenecks in certain instances. For example, consider a use case involving token transformation. Without caching, the policy issues a new JSON Web Token (JWT) with every request, even though the inbound user credential token did not change. The CPU overhead of the JWT was significant enough to cause a slowdown. Caching the fully created and signed JWT increases performance dramatically. This is because the overhead of creating and maintaining cache entries is still lower than the cryptographic signature cost.
In general, look for expensive operations that happen on a per-request basis without providing new information as prime candidates for caching.
To address an increased time to the LDAP authentication bind for each user for Gateway versions 9.2 and later (an increase of approximately 100 ms per user when compared to the same process for Gateway version 9.1), you can tune the authorization cache size and storage of successful binds to mitigate the effects of this time increase:
  • authCache.successCacheSize (default is 2000)
  • authCache.maxSuccessTime (default is 60000 ms)
Both properties can be configured as Gateway Cluster Wide Properties - see Credential Caching Cluster Properties for more information. For more background information on this issue, see the Known Issue here.
Bottlenecks in Common Policy Elements
This section describes common bottlenecks that have single points of dependency. Any policy that has crucial external single points of dependency on every request will be constrained by the performance of that single point. A common approach to avoid this is to use caching and mitigation strategies.
Database Operations
Using the Perform JDBC Query Assertion has transaction rate limitations similar to auditing, because of the transactional nature of the connection. As each policy cannot have knowledge of other concurrent operations, the Gateway cannot easily do multi-statement transactions. This means you should avoid database operations.
On a related note, the API Management OAuth Toolkit has a corollary feature: its default configuration uses a database for token storage. These tokens have longer lifetimes, resulting in less tokens being issued over time, leading to lower token insert rates.
LDAP Operations
The Gateway's Query LDAP Assertion has its own cache size and age, so that needs to be checked against transaction rates and concurrency.