18-2 · Chapter 18 · 5 min read

Rolling Update: How to Deploy Without Taking Everything Down at Once

Imagine this: your application is running on five servers, all serving users. You need to push a new version with a critical bug fix. The old-school way

Rolling Update: How to Deploy Without Taking Everything Down at Once

Imagine this: your application is running on five servers, all serving users. You need to push a new version with a critical bug fix. The old-school way would be to stop all servers, deploy the new version, and start everything again. But that means downtime. Every user gets an error page or a loading spinner that never finishes.

That approach might work for internal tools used by a handful of people at 3 AM. But for a live application with real users, stopping everything at once is a non-starter. You need a way to update the software without making everyone feel the impact simultaneously.

The Problem: All-or-Nothing Deployments

When you replace every running instance of your application at the same moment, you create a window where nothing is serving traffic. Even if the deployment itself takes only seconds, those seconds can mean lost revenue, frustrated users, or failed transactions. For applications that need high availability, this window is unacceptable.

The core issue is that you're treating all your servers as one unit. You stop them together, update them together, and start them together. Any problem with the new version affects every user at the same time. If the deployment fails, you're scrambling to bring the old version back while users are already seeing errors.

The Solution: Replace One Instance at a Time

Instead of updating everything at once, you can update your servers one by one. Here's how it works:

The following sequence diagram shows how a load balancer coordinates the update across instances:

sequenceDiagram participant LB as Load Balancer participant I1 as Instance 1 participant I2 as Instance 2 participant I3 as Instance 3 LB->>I1: Drain traffic I1-->>LB: Acknowledged I1->>I1: Deploy v2.0 I1->>I1: Health check /ready I1-->>LB: Ready LB->>I1: Resume traffic LB->>I2: Drain traffic I2-->>LB: Acknowledged I2->>I2: Deploy v2.0 I2->>I2: Health check /ready I2-->>LB: Ready LB->>I2: Resume traffic LB->>I3: Drain traffic I3-->>LB: Acknowledged I3->>I3: Deploy v2.0 I3->>I3: Health check /ready I3-->>LB: Ready LB->>I3: Resume traffic

You have five servers running version 1.0 of your application.
You take one server out of service.
You deploy version 2.0 to that server.
You verify the new version is working correctly.
You put that server back into service.
You move to the next server and repeat.

At any point during this process, four servers are still running the old version, and one server is running the new version. Users who hit the updated server get the new version. Users who hit the other servers get the old version. No one gets an error because no server is ever fully stopped.

This approach is called a rolling update. The name comes from the way the update "rolls" across your instances one after another, like a wave moving across a field. An instance is any unit where your application runs: a physical server, a virtual machine, or a container.

Why Health Checks Matter

A rolling update only works if you can tell whether the new version is actually running correctly before you update the next instance. This is where health checks come in.

A health check is a simple mechanism that tests whether an instance is ready to receive traffic. Typically, it's an endpoint like /health or /ready that returns a success response when the application is working normally. Your orchestration system - whether it's Kubernetes, a load balancer, or a custom deployment tool - checks this endpoint before sending traffic to the instance.

If the health check fails after you deploy the new version, the rolling update stops. The problematic instance can be rolled back to the old version, and the rest of your servers remain untouched. You've contained the damage to a single instance and a small subset of users.

Without health checks, you're flying blind. You might update all five servers before realizing the new version crashes on startup. By then, every user is affected.

When Rolling Updates Work Well

Rolling updates are ideal for changes that are backward compatible. These are changes where the old and new versions can run side by side without issues. Examples include:

Adding new log statements
Fixing a minor bug that doesn't change data formats
Changing UI colors or text
Adding a new API endpoint that no one uses yet
Updating a dependency that doesn't change behavior

Because old and new instances run simultaneously during the update, backward compatibility is essential. If the new version expects data in a different format, or communicates with other services using a different protocol, you'll get errors when old and new instances try to work together.

The Trade-Offs

Rolling updates are simple and don't require extra infrastructure. You use the same servers you already have. There's no need to spin up a parallel environment or provision additional capacity. Your infrastructure costs stay the same during the update.

The downside is speed. Rolling updates take time because you have to wait for each instance to be updated and verified. If you have fifty servers and each takes a minute to deploy and check, your update takes nearly an hour. For urgent security patches, that might be too slow.

Another limitation is visibility. When a rolling update introduces a problem, the impact spreads gradually. Some users see the issue, others don't. This makes it harder to isolate the root cause compared to strategies where you can clearly separate affected users from unaffected ones.

A Practical Checklist

Before you implement a rolling update, make sure you have these basics in place:

Health check endpoint: Your application must expose a reliable way to verify it's working.
Backward compatibility: The new version must work alongside the old version during the transition.
Rollback plan: Know how to revert a single instance if the health check fails.
Monitoring: Watch error rates and response times during the update to catch problems early.
Sufficient instances: Rolling updates work best with at least three instances. With only two, you lose half your capacity during the update.

The Takeaway

Rolling updates are the default deployment strategy for most modern applications because they solve the fundamental problem of updating software without downtime. The idea is simple: don't replace everything at once. Replace one instance, verify it works, then move to the next. This approach keeps your application available throughout the update and limits the blast radius if something goes wrong.

For small, backward-compatible changes, rolling updates are often all you need. They're straightforward, cost-effective, and widely supported by container orchestration platforms like Kubernetes and cloud deployment tools. But for riskier changes - database migrations, protocol changes, or major feature overhauls - you might need more control. That's where strategies like blue-green deployments or canary releases come in. But that's a topic for another post.