Apply Rate Limit Assertion

The Apply Rate Limit assertion allows you to limit the rate of transactions passing through the CA API Gateway for a given user, client IP address, or other identifier. When this limit is reached, the Gateway can either begin throttling requests or it can attempt to delay the requests until the rate falls below the limit. You can also set a maximum concurrency level to prevent a user from monopolizing resources.
gateway90
The 
Apply Rate Limit 
assertion allows you to limit the rate of transactions passing through the CA API Gateway for a given user, client IP address, or other identifier. When this limit is reached, the Gateway can either begin throttling requests or it can attempt to delay the requests until the rate falls below the limit. You can also set a maximum concurrency level to prevent a user from monopolizing resources.
Use this assertion only if you need to limit the flow of transactions entering the Gateway. 
This page contains the following topics:
2
2
Understanding the Apply Rate Limit Assertion
The following topics are provided to clarify how the rate limit is applied:
3
3
The Token Bucket Algorithm
The Apply Rate Limit assertion uses a token bucket algorithm to shape traffic. To allow a request through, a counter must spend a token from the bucket. Tokens are generated in the bucket and accumulate when there are no requests until a maximum number is reached. The rate at which tokens are generated in the bucket for a given number of seconds is set by the configured rate limit.
Without spreading the limit over time, the token bucket can hold a maximum of 1.5 tokens. If a request is sent through an idle counter, the counter does not have enough tokens to allow a second request until at least half the rate limit has elapsed.
When you spread the limit over time, you enable bursts of traffic because the bucket is allowed to hold more tokens. The number of tokens the bucket can hold depends on the rate limit and the 
spread limit over
 setting. If you are spreading the limit over 5 seconds, then up to 5 seconds worth of tokens are allowed to accumulate in the bucket when a counter is idle.
Rate Limit
Effect
Notes
Without Spreading the limit over time
The Gateway only accepts requests arriving no sooner than
1/limit
of a second.
For a maximum limit of 10 requests per second, the second request can be sent after half of 1/10th a sec, or 1/20th a sec. Over time, this will only allow through messages at a rate equal to the limit.
Spreading the limit over time
Allow requests to arrive in arbitrary bursts that exceed the
Max requests per second
rate over an
X
second window.
Recommended.
For a maximum limit of 10 requests per second, over 5 seconds, the bucket can hold up to 50 tokens. In this mode, the counter will have enough tokens to spend to allow a burst of 50 requests arriving all at once. After this, the bucket is empty and if traffic continues to arrive, the bucket continues to behave as if no Spread Limit is enabled.
The following graph illustrates the difference between a rate limit with or without a Spread Limit.  Spreading over time allows for more traffic and throttles fewer requests.
rate_limit_arc2
rate_limit_arc2
Concurrency
You can limit concurrency per-counter using the concurrency limit.
The intent of the global maxQueuedThreads setting is to prevent all Gateway transport pool threads from being delayed inside rate limit counters at the same time. If this isn't a concern, disable this limit by setting it to a very high value. However, this may cause the Gateway to run out of available threads and stop responding to new requests. If you just want to increase it, a consider setting a limit of  two-thirds of the Gateway's httpCoreConcurency.
Applying Rate Limit to Gateway Clusters
If you have a cluster of gateways, the limits entered in this assertion are divided among the number of "up" nodes in the cluster. A node is considered “up” if it has posted its status within the past 8 seconds (configurable via the 
ratelimit.clusterStatusInterval
 cluster property). The Apply Rate Limit Assertion checks the status of cluster nodes every 43 seconds (configurable via the 
ratelimit.clusterPollInterval 
cluster property).
The Gateway automatically adjusts the rates internally when nodes are added or removed from a cluster. There is no need to modify the values in this assertion. If no authenticated user is established in the policy, then the IP address of the requestor is used instead in the Apply Rate Limit Assertion.
Configure the Apply Rate Limit Assertion Properties
The Apply Rate Limit assertion is available in the assertions panel.
Drag the Apply Rate Limit Assertion from the assertions panel into a policy, or right-click the Apply Rate Limit assertion in an existing policy and select Rate Limit Properties. RateLimitProps.png
Configure the Apply Rate Limit Properties as follows:
Setting
Description
Maximum requests per second
Specify how many requests per second should be processed by the Gateway or cluster.
You can enter a context variable that resolves to the maximum requests value. The context variable must either be single-value or multivalued with a specific index reference.
Cluster wide
If the Gateway cluster comprises more than one node, this setting determines whether the value entered in the
Maximum requests per second field
is split among the nodes or applied to each node.
  • Select this check box to split the value cross all the nodes in the cluster.
    For example, if the maximum is 100, each node in a 4-node cluster will be limited to 25 requests per second. If a node drops out of the cluster, the 100 limit is redistributed across the remaining three nodes.
  • Clear this check box to allow the maximum requests value on
    each
    node.
    For example, if the maximum is 100, each node in a 4-node cluster will be allowed 100 requests per second, resulting in an effective maximum of 400 requests per second. If one node drops out of the cluster, the effective maximum drops to 300 requests per second (3 x 100).
Spread limit over
X
sec window
Determines whether to allow a burst of requests to be spread across a window of time or whether to enforce a hard cap.
  • Select the check box to allow requests to arrive in arbitrary bursts that exceed the
    Max requests per second
    rate over an
    X
    second window.
    This can avoid throttling of traffic over prolonged traffic bursts.You may enter a context variable containing the
    second window value. This variable can be either single-value or multivalued with a specific index reference.
  • Clear the check box to disallow bursts. In this scenario, the Gateway will only accept requests arriving no sooner than
    1/limit
    of a second.
    For example, if the
    Max requests per second
    is 100, at least 1/100 second must have elapsed between requests. Requests that arrive sooner are either throttled or shaped (based on the "When limit exceeded" setting). Disallowing burst traffic is recommended only for advanced users.
    It is not recommended to disable burst traffic on a counter that will be servicing multiple concurrent requests, particularly at high rates. Doing so can lead to unintended throttling or delaying of multiple requests that arrive at exactly the same time.
Limit each
Use the drop-down list to indicate how limiting should occur:
  • by the
    User or client IP
    address
  • by the
    Authenticated user
    name
  • by the
    Client IP
    address
  • by the
    SOAP operation
    within the request
  • by the
    SOAP namespace
    within the request
  • by the
    Gateway node
  • by a
    Custom
    value (enables a limit per value of a context variable); enter the node identifier followed by a context variable that will resolve to the correct entity during run time.
This limit breakdown impacts both the maximum number of requests per second as well as the maximum concurrency.
For example, if you choose “by client IP address” and set the maximum concurrency to 10 and maximum number of requests per second to 100, the assertion will fail if any incoming IP address exceeds either the concurrency of 10 or the 100 requests per second; all IP addresses combined are permitted to exceed these limits however. You can combine multiple instances of this assertion to impose difference limits by different breakdown factors, such as “maximum 10 per IP and maximum 100 for all combined”.
To help you construct a custom format, the entry box will display the actual node identifier and context variable associated with each of the other limit options once you've selected the Custom option. For example, when you first open the Rate Limit Properties,
User or client IP
is selected by default. Now, choose
Custom
and then reselect
User or client IP
. You will see that the actual coding behind this is
<node identifier>-${request.clientid}
.
When limit exceeded
Specify what happens when the rate limit is exceeded:
  • Throttle:
    Excess requests cause the assertion to fail. The audit message 6950 is logged.
  • Shape:
    The assertion attempts to delay requests to avoid exceeding the limit. If the Gateway is unable to spare sufficient resources to hold a request any further, a 503
    (Service Unavailable)
    error may still occur.
  • Log Only:
    The assertion logs that the rate limit has been exceeded, but the assertion does not fail.
    The audit message 6950 is logged.
  • Blackout for
    X
    sec:
    Select this check box to fail all requests for the next
    X
    seconds after the limit is exceeded, even if the rate of requests falls below the limits defined in this assertion. Value must be greater than 1 second.
    For a blackout period greater than 13 seconds, increase the
    ratelimit.cleanerPeriod
    cluster property to prevent the rate limit counters from being flushed before the blackout period ends. If the counters are flushed prematurely, the rate limits are not applied. For more information on this cluster property, see Rate Limit Cluster Properties.
The number of threads that can be queued within a node is defined by the
ratelimit.maxQueuedThreads
cluster property. For more information, see Rate Limit Cluster Properties.
Maximum concurrent requests
Indicate whether to enforce concurrency limits for a given named rate limiter (as specified by the
Limit each
setting).
  • Unlimited:
    Concurrency is not enforced. A named rate limiter can have an unlimited number of active requests simultaneously in the Gateway or cluster. This may result in someone consuming a disproportionately high amount of system resources.
  • Limited to:
    Ensure that no named rate limiter can have more than the specified number of concurrent requests passing through this assertion. Requests that exceed the concurrency limit cause the assertion to fail.
    The audit message 6953 is logged.
    You can enter a context variable that contains the maximum concurrent requests value. This variable can be either single-value or multivalued with a specific index reference.
  • Cluster wide:
    If the Gateway cluster comprises more than one node, this setting determines whether the value entered in the
    Limited to
    field is split among nodes or to be applied to each node. This setting is the default.
Select this check box to split the value across all the nodes in the cluster. For example, if the maximum is 10, each node in a 5-node cluster will result in a concurrency limit of 2 requests per node.
Clear this check box to allow the maximum requests value on
each
node. For example, if the maximum is 10, every node in the cluster will be allowed 10 concurrent requests.
The concurrency counter is incremented when a request passes through the Apply Rate Limit Assertion (even if the assertion ends up failing). The counter is decremented once the request is completely finished.
Audit Detail Codes
The Gateway audit log displays the codes and messages associated with the apply rate limit assertion. 
In the list of message codes, "{0}", "{1}", etc., are placeholders for messages that may vary depending on the context of the audit.
Messages may convert non-identifiable characters into a string literal of their Unicode value. For example, if "null" is being expressed in a message, it will be displayed as "\u0000" (the Unicode representation for null).
6900 WARNING Quota exceeded on counter {0}. Assertion limit is {1} current counter value is {2} 6901 INFO Quota already exceeded on counter {0} 6902 WARNING Invalid Quota Counter ID: {0} 6903 WARNING Configured max quota value {0} is too large. The max value allowed is {1} (where: {0} is the value found at runtime and {1} is the maximum allowed value) 6950 INFO Rate limit exceeded on rate limiter {0} 6951 INFO Unable to further delay request for rate limiter {0}, because maximum delay has been reached 6952 INFO Unable to delay request for rate limiter {0}, because queued thread limit has been reached 6953 INFO Concurrency exceeded on rate limiter {0}. 6954 INFO Rate limit of {0} exceeds maximum rate limit of {1}. Setting maximum limit to {2}