Jitter the first request, too
Exponential backoff with jitter is an effective retry strategy that helps avoid thundering herds, but most implementations only start applying jitter after the initial request fails. If you’re using a library which is implemented that way, then your workload might still be vulnerable to thundering herd problems.
Good to know
Exponential backoff is a retry strategy where failed requests are retried after increasingly long delays (e.g. 100ms, 200ms, 400ms, …). Jitter adds randomness to those delays to prevent clients from retrying at the exact same time, which can overwhelm a system.
Google Cloud’s last global outage was exacerbated by exactly this problem. A critical service needed to load a configuration value from a Spanner table, and the table became overwhelmed by the number of simultaneous read requests made by all of the service nodes starting up at the same time.
The fix is simple: jitter the initial request in addition to retries.
// ❌config, err := fetchConfig() // ✅jitter := time.Duration(rand.Intn(1000)) * time.Millisecondtime.Sleep(jitter)config, err := fetchConfig()
Let’s consider a few more workloads where the initial request to a downstream dependency can create problems.
Example workloads
Request bursts
If a consumer sends you a burst of requests in parallel, then those requests can end up overwhelming components further downstream.
Consider an ecommerce API with an endpoint for placing an order. We have the following two components:
- A thin API service which validates incoming requests, and forwards them to downstream services.
- An “order placement service” which owns the responsibility for coordinating the creation of customer orders.
If a big burst of order requests come in to our API, then the order placement service will need to scale up to handle them. You can’t spin up a new instance immediately, so something needs to happen to in-flight requests while the system is busy starting up new capacity.
In this situation some load balancers—like AWS ELBs—will queue up requests that are pending routing to a healthy instance. When such an instance becomes available, the load balancer is liable to spam it with all of its queued up requests and this influx of requests all at the same time can result in a thundering herd.
Load balancers queuing requests in this manner is generally considered a bad idea, which is why ALBs no longer do it. Behavior varies across load balancers—make sure you know what yours actually does.
Queue cold start
At Crimson Global Academy we had a heavily event-driven system based on SNS and SQS. All of our queues had dead-lettering configured, and this was immensely helpful whenever a bug prevented events from being processed successfully.
If you redrive messages at the maximum speed possible and the message processor makes network requests then you can easily create a thundering herd! One of our event handlers was responsible for assigning tags to lesson recordings, and it looked something like this:
export const handler = async (event) => { for (const record of event.Records) { const recordingEvent = JSON.parse(record.body); const recording = await db.getRecordingById(recordingEvent.id); const tags = await generateTags(recording.transcript); await db.updateRecording(recording.id, { tags, }); }};
This looks suspiciously like Spanner lookup that contributed to Google’s outage! If we fire many lesson recording events at the same time, then the event handlers will all issue a database read at the same time. A little bit of jitter at the start of the function can help mitigate this problem.
You can see the difference some jitter on the initial request makes visualized in the histogram below. I’ve simulated 30 event handlers kicking off at the same time, all configured to use exponential backoff1. Note how much of a difference jittering makes on the initial request! It almost halves the instantaneous request rate upon startup:
In practice it’s a bit unusual to see queue cold starts cause thundering herds, because it’s extremely common for the consumer to use some sort of serverless technology like AWS Lambda. Ironically, serverless cold starts—the thing everyone tries to avoid!—act as built-in jitter, adding resilience.
“Serverful” consumers (or serverless technologies that support processing multiple concurrent requests) do need to be mindful of this problem, though. A freestanding server will accept requests as quickly as they come in up until its resources are exhausted.
Fan-out operations
This is a very similar problem to queue coldstarts, with the main difference being that it happens while the queue is “live” as opposed to during a redrive.
Say you have some infrastructure for delivering webhooks. A class called WebhookManager has a method called sendWebhook which is responsible for the following:
- Figuring out the list of user IDs to deliver the webhook to.
- Queuing up a task for delivering the webhook to each user.
Separately, you have a consumer at the other end of the queue responsible for delivering the webhook:
- Load the user’s webhook URL (based on their ID).
- Sign the webhook payload.
- POST it to the user’s webhook URL.
This setup makes a lot of sense as you can configure a retry policy on the queue to get webhook delivery retries for free. But if you are sending a webhook that a lot of users have subscribed to and queue up those tasks in parallel, you can easily create a thundering herd on your database caused by the webhook URL lookups.
Real-world impact
What all of these scenarios have in common is that they involve coordinated behavior across multiple instances or processes. In these cases, lack of jitter creates a DDoS attack on your own infrastructure.
These failure modes are quite common in real-world distributed systems. We had more than one incident caused at CGA by redriving a dead letter queue a bit too quickly.
What makes this problem particularly insidious is that it rarely shows up during testing. Most engineers’ instincts around load testing is to run scenarios involving gradually ramping up traffic over time, not sudden synchronized bursts.
Know your system
Like any engineering decision, jittering the first request comes with tradeoffs and isn’t always the ideal solution. The astute reader will have already realized that thundering herds caused by queue cold starts and fan-outs can also be dealt with by simply limiting the maximum concurrency of the worker processing the queue.
The big downside of jittering the first request is that it adds unavoidable latency to your system. For applications with tight performance budgets, this is an unacceptable tradeoff.
The key is to understand your system’s failure modes. If you’re operating a small application that mostly interacts with scaled third-party vendors, then coordinated request storms likely isn’t a concern. On the other hand, if you’re building a service which is much larger than your downstream services then jittering the initial request to those services might help improve your overall reliability.
Next time you’re implementing retry logic, don’t just think about the retries. Think about how the very first request gets triggered, too. A little randomness can go a long way.
- Exact parameters: 4x retries, 100ms initial delay, with 0-150ms jitter on each attempt↩