Deployment strategies: Blue-Green x Canary x A/B Testing

In the last post on Cloud Computing for Beginners Series, we have covered the best practices that DevOps teams adopt in their systems development projects. In this post, we will deal with what is known in IT as deployment and the techniques for releasing applications for analysis and testing of specific features and functionalities. “Deploy” is nothing more than deploying a system or app, or in other words, making it work in a complete and stable way.

Before diving into the different types of deployment techniques available in the cloud market today, it is important to mention what a deployment strategy is. These are several techniques used to make changes or updates to an application without downtime on the system, in such a way that the user does not even notice what was done at the time.

The most common deployment strategy in use today is known as Blue-Green Deployment, and consists of an application release management technique that reduces downtime and also the risk of failures in the system when running two equal production environments, identified as “Blue” and “Green”. Usually, each one of them running a version of the application, and the other one running the next or previous version.

The goal is to provide reliable testing, continuous upgrades without interruptions, and instant rollback (not to be confused with “revisions”). At any time, only one of the environments is active, with the active environment serving all production traffic.

In the past, just before the blue-green deployment existed, there was the Rolling Deployment. As a comparison, in Rolling Deployment the user has only ONE complete environment to work. After a user starts to update the environment, the code is deployed to the subset of instances in the same environment and moves on to another subset upon completion. And that’s it. So it’s easy to understand why the BlueGreen deployment is clearly a “next level” update in software development.

But it doesn’t end here: there is also another category in the deployment strategy, which is called Canary Deployment (or Canary Release); it is a code deployment technique that aims to reduce the risk associated with the introduction of a new version of the software in production. For this to happen, a Canary Release allows developers to deploy new code or features only to a subset of users as an initial test, before deploying it to the entire infrastructure and making it available to everyone at once.

In case you are wondering: where the word “canary” comes from, the term really came from the canaries we know. In this case, it is a mention of the canaries used in the past to alert miners when toxic gases reached dangerous levels inside the coal mines. Just like in the mines, the canaries “alert” as far as everything goes well and you can continue without any major problems.

Applied to Software Engineering, a Canary Release works as follows: a new version of the software, known as Canary, is deployed for a small subset of users along with the stable running version. The traffic is split between these two versions so that a portion of incoming requests is diverted to the Canary version. The quality of the Canary version is assessed by comparing the main metrics that describe the behavior of the old and new versions. If there is a significant degradation in these metrics, the Canary version is aborted and all traffic is routed to the stable version in an effort to minimize the impact of unexpected behavior.


And last but not least, we have the A/B tests deployment strategy. These tests consist, under specific conditions, of routing a subset of users to access a new feature in a software or application. In fact, the technique is used primarily to assess the efficiency of a change in the system x as the market reacts to it. Therefore, it is more about making business decisions based on statistics, rather than a deployment strategy, since the new features to be tested will be released for a selected set of users.

It is a strategy very similar to the Canary Release, except that the latter is moving a new product or feature to a particular community before being fully deployed to ALL customers. Blue-Green is a deployment strategy to test the new version of service (without taking it down or putting it at risk). In this model, both Blue and Green will be up and running at some point in time and, then, merge perfectly with each other.

Canary Analysis on Netflix with Kayenta

Kayenta is a next-generation platform for automated Canary Release analysis (Automated Canary Analysis, under the acronym ACA). It is used by another platform, Spinnaker, for continuous delivery, to enable automated Canary deployments. The Kayenta platform is responsible for assessing the risk of a Canary Release and checking for significant degradation between the current stable version and the candidate Canary Release. Today widely used by DevOps teams, Kayenta’s success did not come before a lot of work by Netflix engineers.

Canary analysis was initially a manual process for them, which means that a developer or software engineer would examine server graphics and servers logs with the current stable version and compare with the candidate Canary to see if the metrics’ accuracy (status codes HTTP, response time, exception counts, load balancing, etc.) match. If the data seemed reasonable, a manual judgment was made to move forward or go back in the process.

Needless to say that the manual approach, besides not being reliable, does not allow the scaling (expansion) of the system as a whole. Each Canary Release meant spending several hours examining graphics and combining logs. Which made it difficult to deploy new code compilation more than once or twice a week. The visual comparison of the graphics also revealed subtle differences between Canary and the reference code.

The first attempt to automate Canary analysis by Netflix engineers was through a very specific script for the application they were monitoring. The next attempt was to expand that process and introduce the first version of automated Canary Release analysis. This happened over 5 years ago for these teams. Now, Kayenta is an evolution of this system and is the result of the lessons that these engineers have learned over years of continuous delivery, with fast and reliable changes in production on the Netflix development platform.

Feature Toggles or Feature Flags x A/B Tests

Feature Toggle, also known as Feature Flag is a technique used by DevOps teams to hide, enable or disable a particular feature during its execution time – thus, being able to “switch” from one state to another. This flexibility allows teams to modify the system behavior without changing any code since it can be “turned on” or “turned off” remotely without the need for deployment.

Features Flags provide an alternative to maintaining multiple branches in the source code so that a software resource can be tested before it is even ready for release – an expensive trait for both the customer and the DevOps teams.

Among some of its most common benefits, are:

  • Test in production

As it is generally impossible to fully simulate the production environment in its various stages, Feature Toggles allow DevOps teams to validate new release features in the real world, minimizing risks.

  • Canary Release

The Canary test helps to limit the risk of releasing a feature for the entire user base; allows you to quickly roll back a feature simply by turning it on or off, instead of going through another deployment cycle.

  • Faster release cycles

Using Feature Toggles, a DevOps team can modify the behavior of a system without making sensitive code changes in the active code. Therefore, one of the main benefits of Feature Toggles is the simplicity of deployment in the development process.

Server-side A/B testing

Developers can deploy A/B testing, using Feature Flags to enable a feature for half of a user segment and disable the feature for the other half and observe how each one works for a given metric (such as “application usage” or “purchases”). Since the test is deployed on the backend via code, there is no latency.

There are also different types of Feature Toggles, based on the context in which they are used:

Release Toggles

One stage of the development project was completed and the person responsible for the project accepted everything. The changes are sent to be accepted by the user, but only part of the developed feature is approved. You want to put what has been approved into production, but the other part of the feature should not be made available. What can you do, then? Release Toggles is the answer! Each feature is protected by a release toggle so that any run-time feature can be revealed or hidden. Therefore, all changes can go into production, and only some of the release toggles are activated.

Ops Toggles 

Ops Toggles are Features Toggles that Ops can use to disable certain features in times of vulnerability or risk in the system. These flags are used to control the operational aspects of the system’s behavior. They can be used, for example, to release a new feature with unclear performance implications so that system operators can quickly disable the feature before it is available to all users.

Experiment Toggles

Experiment Toggles are used to perform several A/B tests. Each user of the system is separated by groups and, at a certain time of execution, a Toggle Router will consistently send a particular user to one of the variations of the experiment, based on the group in which it is inserted. By tracking the aggregate behavior of different groups, we can compare the effect of different variations. This technique is commonly used to make data-based optimizations for things like the purchase flow of an e-commerce system or even the call-to-action of a button.

Permissioning Toggles

Permissioning Toggles flags are used to change the features or product experience that certain users receive. For example, we may have a set of “Premium” features that in turn are only enabled for paying customers. Or perhaps, a set of “Alpha” features that are only available for internal users and another set of “Beta” features that are only available for internal users and beta users.

Stay tuned! In the next post in the cloud series for beginners, we will deepen the concepts of Chatops, Gitflow, and GitOps, present in the DevOps approach to software development.

One thought on “Deployment strategies: Blue-Green x Canary x A/B Testing

Leave a Reply