Issues in HTTP Load Balancing

This topic describes some of the issues that you may encounter when configuring HTTP load balancing for your gateway.
gateway83
This topic describes some of the issues that you may encounter when configuring HTTP load balancing for your
CA API Gateway
.
Contents:
How HTTP Keep-Alive Works
All web browsers and nearly all client software produced in the last 20 years support the HTTP 1.1 RFC. It reduces latency and reduces OS overhead by avoiding the three-way TCP handshake for subsequent requests.
The Gateway disables inbound keep-alive requests on new connections if more than 75% of a connection pool is in use processing messages. This may impact some performance tests, though it does not affect most connections. Setting the cluster properties io.
httpCoreConcurrency
and
io.httpMaxConcurrency
to at least double the expected test concurrency should help avoid performance issues.
Note:
You should be aware of the system capacity, latency, and message sizes being processed before setting these cluster properties.
By default, current Gateway versions use a relatively higher core concurrency of 750, which reflects a greater number of outstanding requests. With performance tests above 600 concurrent requests, this may imply setting a larger core concurrency because of the 75% rule around keep-alive.
These two indicators help you determine the keep-alive state:
  • The HTTP
    Connection
    header indicates the current state of a connection. The connection is closed if an idle connection is indicated by a TCP FIN packet.
    In the standard connection flow, the server side is responsible initially to allow or disallow keep-alive. It can also disable keep-alive unilaterally.
  • The
    Content-Length
    field must contain a value for keep-alive to work. If a message is of unknown length, then the server disables the keep-alive. 
Load balancers sometimes change keep-alive behavior by using timeouts that are different from the client or server.
HTTP 1.1 keep-alive may conflict with older load balancer designs. Reason: The FIN packet used to close idle connections can be dropped inadvertently. When using older load balancers, pay close attention to the keep-alive session lifetime when setting the load balancer session affinity lifetime.
Most load balancers can use keep-alive on the back-end, regardless of what happens on the client-facing front end. This offers efficiency gains because of the ability to pool connections on the back-end rather than constantly establishing new ones. This is especially useful in cases where the load balancer is terminating SSL.
Example: The OneConnect feature of F5 load balancers. The F5, A10, and other load balancers can handle different keep-alive timeouts between the front end and back end. This is done by forcing connection resets on the front end when the back-end session expires, or when the back-end pool member is detected down (for example, reset-fwd/reset-rev option on A10 load balancers).
Failure Case Around Keep-Alive and Timeouts
One consequence of having an incorrectly set timeout is a CLOSE_WAIT state with one byte remaining that never clears.
When the load balancer drops the session because of lifetime expiry, then the packets from client to server and server to client are no longer forwarded. As a result, the final TCP connection shutdown does not complete.
In this state, part of the TCP state management never completes because the final packet associated with TCP connection shutdown is not received. This causes the TCP to stack wait forever.
In the Gateway routing assertions, this can prevent the connection close from completing, which prevents the thread from returning to the connection pool. This eventually fills up the connection pool with dead back-end connections.
SSL Sessions
Web Servers
In web server load balancing, SSL pass-through is used rarely. Many configurations terminate SSL at the load balancer and use non-SSL on the back-end, because mutually authenticated SSL is uncommon.
API Gateways
SSL is often terminated at the Gateway for many reasons. For example, this is how some of the features work for mutually authenticated SSL. Consider the Gateway as behaving more like a standard web server case with some specific differences.
SSL Session Renegotiation Performance Issues
The following simplified diagram shows three scenarios:
  • A bad case of using layer 4 (SSL pass-through) without session affinity; this causes intense CPU and entropy
  • An SSL session being reused using load balancer session affinity
  • A subsequent request from a client to a server that does not have the local SSL session data. This is a common case for round-robin strategies or for least-connections type balancing algorithms. This causes excessive Gateway CPU consumption
ssl-session-reuse.png
Mutually Authenticated SSL
Many organizations use mutually authenticated SSL and terminating the SSL at the load balancer presents some challenges. Client certificates are a crucial part of the Mobile API Gateway MSSO SDK.
It is useful for a load balancer to act as an SSL-to-SSL reverse proxy. In this configuration, the SSL connection terminates at the load balancer (which carries the production SSL certificate) rather than simply passing it to the back-end server. This allows a persistent SSL connection to remain on the back-end, regardless of what happens on the front end. It also allows for more intelligent inspection, parsing, and routing of traffic by the load balancer. Finally, it allows for control of all aspects of SSL security by presenting a unified set of encryption methods, cipher ordering, and overall client compatibility. This way, the Gateway avoids needing to perform these configurations and audits on every back-end server.
SSL termination at the Gateway adds little CPU usage
if
the load balancer is SSL-session aware. Based on benchmark testing of Gateway virtual appliances, 1 Kbyte of messages results in 25K TPS (transactions per second) with non-SSL and 18K TPS with SSL using SSL session reuse.
Mobile API Gateway MSSO SDK Uses Mutual Authenticated SSL
The
CA API Gateway
depends on automated mutually authenticate SSL provisioning. This is a challenge for the Mobile API Gateway (MAG). SSL certificates are provisioned and signed for every phone.
This means that the load balancer needs constant updates of allowed certificates for mutual Auth. In terms of the MAG + Mobile SSO SDK use cases, SSL termination at the load balancer is unlikely to work.
CPU Usage
SSL sessions are expensive to create. Modern CPUs with full strength algorithms and server private keys may require 50ms or more of CPU usage to create the session. Adding the back-and-forth packets and network latency, this could result in a session setup time of 100ms or more.
To achieve hundreds of requests per second without huge CPU usage, the SSL specification allows the SSL hello to include a session ID from a previously created session. If both ends have that session, they can both skip the expensive session creation.
By design, SSL sessions are renegotiated every 30 minutes, so this is not a decrease in security.
Round-robin with no affinity can often disable SSL session reuse. Some load balancers have explicit "SSL Session Affinity" mode. Use this mode if available.
Failure Detection
Many environments require failure detection/high availability, so detecting failed nodes is crucial. Most commercial load balancers use an active failure detection mechanism such as polling individual nodes. There is also a less common mode of passive failure detection by reading HTTP response codes, specifically looking for HTTP 500 errors and reacting accordingly.