When Data Decides: Using Observability to Drive Progressive Delivery
You've just pushed a new version of your application. The canary deployment starts routing 10 percent of traffic to the new instances. Everyone on the team stares at the dashboard. Is it working? Is it failing? Should you keep going or pull the plug?
Without real data, you're guessing. And guessing during a release is how small problems turn into production incidents.
Progressive delivery - whether you call it canary, blue-green, or phased rollout - only works if you have a reliable way to measure whether the new version is actually healthy. The decision to proceed, hold, or rollback needs to be based on something more solid than gut feeling or a quick glance at a single chart.
The Four Signals That Matter During a Release
When a new version starts receiving traffic, four metrics give you the full picture of what's happening. These aren't exotic measurements. They're the same signals you should already be monitoring in production, but during a progressive release, you need to compare them between the old and new versions in real time.
Error Rate
This is the percentage of requests that fail out of all requests hitting the new version. If your application normally runs with an error rate below 0.1 percent, and suddenly it jumps to 5 percent after the new version goes live, something is wrong.
Error rate spikes can come from many places: a bug in the new code, a dependency that changed behavior, or a configuration mismatch between environments. The key is being able to see the difference between the old version's error rate and the new version's error rate side by side. A 0.5 percent error rate on the new version might look fine in isolation, but if the old version is running at 0.05 percent, that's a tenfold increase worth investigating.
Latency
Response time changes are often the first sign that something is off, even before errors appear. A new version might introduce a slower database query, add an unnecessary processing step, or change how caching works. Users might not notice a few extra milliseconds, but when latency jumps from 200 milliseconds to 2 seconds, the experience degrades noticeably.
Monitor latency from the server side, but also from the user side if you can. Server-side metrics tell you how fast your application responds, but client-side metrics tell you what users actually experience, including network delays and browser rendering time.
Traffic
This metric confirms that your routing configuration is actually working as intended. If you set the canary to receive 10 percent of traffic, you need to verify that 10 percent of requests are indeed hitting the new version. Misconfigured load balancers, sticky sessions, or caching layers can cause traffic to split unevenly.
Traffic volume also tells you whether the new version can handle the same load as the old version. If the new version starts dropping connections or refusing requests at the same traffic level, that's a clear sign of a capacity problem.
Saturation
Saturation measures how full your server resources are. CPU, memory, disk I/O, and database connections all need monitoring. If the new version suddenly uses twice the memory of the old version, your servers could run out of resources and crash.
Saturation is often a leading indicator. It shows up before error rates spike or latency increases. If you catch saturation early, you can pause the release and investigate before users are affected.
Setting Thresholds Before You Release
These four metrics don't mean much without thresholds to compare against. You need to define what "healthy" looks like before the release starts, not during the middle of it when everyone is stressed and opinions start flying.
Set specific numbers for each metric. For example:
- Error rate must stay below 0.5 percent
- Average latency cannot increase more than 20 percent compared to the old version
- CPU utilization must remain under 80 percent
- Memory usage must not exceed 90 percent of available capacity
These thresholds should come from your existing service level objectives (SLOs) or from historical data about how the application normally behaves. If you don't have historical data, start with conservative numbers and adjust as you learn.
The thresholds also need to account for the traffic percentage. A canary running at 5 percent traffic might not show problems that only appear under full load. Consider setting different thresholds for different stages of the rollout, or use statistical methods that detect anomalies even with small sample sizes.
Making Decisions Based on Data
Once the metrics are flowing and thresholds are set, the decision process becomes straightforward. You don't need to argue about whether the release looks okay. The data tells you.
If all metrics stay within safe boundaries for a defined observation period - say five minutes of stable data - you proceed to the next stage. Increase the traffic percentage from 10 percent to 25 percent, or shift more users to the new version. Then observe again.
The decision process follows a clear flow based on the four signals and their thresholds:
If any metric crosses a warning threshold but stays below the critical threshold, you hold the release. Don't increase traffic. Don't rollback yet. Give the team time to investigate whether this is a real problem or a temporary spike.
If a metric crosses the critical threshold - error rate spikes dramatically, latency triples, or servers start running out of memory - you rollback immediately. Don't wait for a meeting. Don't ask for approval. The data has already decided.
Automating the Decision Loop
Manual decision making works for small teams and low-risk releases, but it doesn't scale. Humans are slow, inconsistent, and prone to bias. The same person who rolls back immediately on Monday might hesitate on Friday because the release is urgent.
The better approach is to automate the entire decision loop. Your deployment pipeline should read observability data, compare it against thresholds, and decide whether to proceed, hold, or rollback without human intervention.
This doesn't mean removing humans from the process entirely. It means moving human involvement to where it adds the most value: defining thresholds, reviewing patterns over time, and handling edge cases that the automation can't anticipate. The routine decisions - "is this canary healthy enough to increase traffic?" - are exactly the kind of decisions that computers handle better than people.
A Practical Checklist for Your Next Release
Before you run your next progressive delivery, make sure these pieces are in place:
- Error rate, latency, traffic, and saturation are all being collected for both old and new versions
- Thresholds are defined and documented before the release starts
- The observation window is set (how long to wait before making a decision)
- Automated rollback is configured and tested, not just planned
- Someone on the team knows what to do if the automation fails or produces a false positive
The Takeaway
Observability turns progressive delivery from a guessing game into a data-driven process. When you have real-time metrics, clear thresholds, and automated decision making, you stop asking "is this release safe?" and start asking "what does the data say?" The answer is always waiting in your dashboards and logs. The hard part is setting up the system to listen.