How Temporal Powers Reliable Cloud Operations at Netflix

Temporal is a Durable Execution platform which allows you to write code “as if failures don’t exist”. It’s become increasingly critical to Netflix since its initial adoption in 2021, with users ranging from the operators of our Open Connect global CDN to our Live reliability teams now depending on Temporal to operate their business-critical services. In this post, I’ll give a high-level overview of what Temporal offers users, the problems we were experiencing operating Spinnaker that motivated its initial adoption at Netflix, and how Temporal helped us reduce the number of transient deployment failures at Netflix from 4% to 0.0001%.

A Crash Course on (some of) Spinnaker

Spinnaker is a multi-cloud continuous delivery platform that powers the vast majority of Netflix’s software deployments. It’s composed of several (mostly nautical themed) microservices. Let’s double-click on two in particular to understand the problems we were facing that led us to adopting Temporal.

In case you’re completely new to Spinnaker, Spinnaker’s fundamental tool for deployments is the Pipeline. A Pipeline is composed of a sequence of steps called Stages, which themselves can be decomposed into one or more Tasks, or other Stages. An example deployment pipeline for a production service may consist of these stages: Find Image -> Run Smoke Tests -> Run Canary -> Deploy to us-east-2 -> Wait -> Deploy to us-east-1.

Leave a Comment

Scroll to Top