← All Posts
High Level Design Series · Case Studies· Post 66 of 70

Case Study: Netflix Architecture

Netflix at Scale

Netflix is one of the most demanding distributed systems ever built. As of 2024, the platform serves 230+ million subscribers across 190+ countries, consuming roughly 15% of global downstream internet bandwidth during peak hours. Every evening, hundreds of millions of concurrent streams fire up — each requiring real-time content discovery, personalization, adaptive video encoding, and sub-second playback start.

MetricScale
Global subscribers230M+ paid memberships
Daily viewing hours~500 million hours/day
Internet bandwidth~15% of global downstream traffic
Microservices1,000+ microservices in production
API requestsBillions of API calls per day
Content library17,000+ titles, each encoded in 1,200+ streams
CDN footprintOpen Connect Appliances in 1,000+ ISP locations
AWS regions3 active AWS regions (us-east-1, us-west-2, eu-west-1)

What makes Netflix architecturally fascinating is not just the scale, but how they got there. The story of Netflix's engineering evolution — from a monolithic DVD-rental application to the planet's most sophisticated streaming platform — is a masterclass in incremental migration, build-vs-buy decisions, and embracing failure as a feature.

The Monolith-to-Microservices Migration (2008–2016)

In 2007, Netflix suffered a major database corruption incident that took down the site for three days. This event became the catalyst for one of the most famous architecture migrations in software history.

Migration Timeline

Phase 1: The Wake-Up Call (2008–2009)

Netflix's original architecture was a classic monolith: a single Java application backed by an Oracle relational database. The DVD rental service, the nascent streaming service, billing, user profiles, and content metadata all lived in one codebase deployed as a single WAR file. The 2007 database corruption was not a code bug — it was a systemic failure caused by tight coupling. A corruption in the billing tables cascaded into the entire application because everything shared one database.

The engineering team made two pivotal decisions:

Phase 2: Strangler Fig Pattern (2009–2012)

Netflix didn't rewrite everything at once. They used what Martin Fowler later named the Strangler Fig Pattern: new features were built as microservices, and existing functionality was incrementally extracted from the monolith. An API gateway (the precursor to Zuul) routed traffic — some requests went to the old monolith, others to new services. Over time, the monolith shrank as services grew around it.

Key milestones in this phase:

Phase 3: Full Cloud Native (2013–2016)

The final phase involved migrating the remaining monolith components, including the most sensitive system: billing. Billing was the last to move because it directly handles money — any bug means revenue loss. Netflix built an elaborate dual-write system where both the monolith and the new billing microservice processed transactions in parallel, with automated reconciliation to verify consistency before cutting over.

Lesson learned: The migration took eight years. Netflix explicitly chose a slow, incremental approach over a risky "big bang" rewrite. Each step delivered value independently. When engineers proposed a 2-year rewrite plan, leadership responded: "What if we need to pivot in 6 months?" Incremental migration meant they could always stop and still have a working system.

Guiding Principles

Several principles emerged from the migration that shaped Netflix's engineering culture permanently:

  1. Freedom and Responsibility: Teams own their services end-to-end — design, development, deployment, and on-call. No separate ops team deploys your code.
  2. Highly Aligned, Loosely Coupled: Teams agree on goals and interfaces but choose their own tech stacks, languages, and release cadences.
  3. Context, not Control: Engineering leadership sets direction and provides tools, but doesn't mandate implementation details.
  4. Embrace the Cloud: Treat cloud instances as ephemeral. Any instance can die at any time. Design for it.

Netflix OSS — The Infrastructure Stack

Netflix didn't just build microservices — they built an entire open-source ecosystem to make microservices work at scale. The Netflix OSS stack became the de-facto reference architecture for cloud-native systems, and many of its concepts were later absorbed into frameworks like Spring Cloud.

Zuul — API Gateway

Zuul is Netflix's edge service that handles all incoming API traffic. Every request from every Netflix client (TV, phone, browser, game console) passes through Zuul before reaching any backend microservice.

Zuul operates as a series of filter chains:

Zuul 2 (released 2018) moved from a blocking Servlet model to a non-blocking, event-driven architecture built on Netty, handling millions of concurrent connections with far fewer threads. At peak, Zuul handles 300,000+ requests per second per cluster.

Eureka — Service Discovery

With 1,000+ microservices and tens of thousands of running instances, you can't hardcode service locations. Eureka is Netflix's service registry — a REST-based service where every instance registers itself at startup and sends periodic heartbeats.

Key design decisions in Eureka:

Ribbon — Client-Side Load Balancing

Unlike traditional load balancing where a central LB (like NGINX or HAProxy) sits between clients and servers, Ribbon pushes load-balancing logic into the client. Each service instance gets the list of available backend instances from Eureka and makes its own routing decisions.

Why client-side LB? It eliminates a network hop. Instead of Client → LB → Server, it's Client → Server directly. At Netflix's scale, removing one hop from billions of daily requests saves measurable latency and eliminates the load balancer as a single point of failure.

Ribbon supports multiple strategies: round-robin, weighted response time (prefer faster servers), availability filtering (skip servers with open circuit breakers), and zone-aware routing (prefer servers in the same AWS availability zone to reduce cross-zone data transfer costs).

Hystrix — Circuit Breaker

Hystrix is Netflix's implementation of the circuit breaker pattern — arguably their most important resilience tool. In a microservice architecture, a single slow or failing service can cascade failures across the entire system (the "thundering herd" or "retry storm" problem).

Hystrix wraps every inter-service call in a command that monitors failure rates in real time:

// Simplified Hystrix command flow
class GetRecommendationsCommand extends HystrixCommand {
    // 1. Is the circuit open? (failure rate > 50% in 10s window)
    //    → YES: Skip the call entirely, return fallback immediately
    //    → NO:  Proceed to step 2

    // 2. Is the thread pool / semaphore full?
    //    → YES: Reject immediately, return fallback (bulkhead pattern)
    //    → NO:  Proceed to step 3

    // 3. Execute the actual network call with a timeout (e.g., 1000ms)
    //    → Success: Close circuit, return response
    //    → Failure/Timeout: Record failure, check if threshold exceeded
    //       → If threshold exceeded: Open circuit for 5s
    //       → Return fallback response

    @Override
    protected List<Video> run() {
        return recommendationService.getPersonalized(userId);
    }

    @Override
    protected List<Video> getFallback() {
        // Return cached recommendations or top-10 popular titles
        return cachedRecommendations.getOrDefault(userId, topTenPopular);
    }
}

Critical Hystrix concepts:

Post-Hystrix era: Netflix has since moved much of the Hystrix functionality into their Adaptive Concurrency Limits library, which dynamically adjusts concurrency limits based on measured latencies rather than static configuration. Hystrix is in maintenance mode, but the circuit-breaker concepts it popularized remain foundational.

EVCache — Distributed Caching

EVCache (Ephemeral Volatile Cache) is Netflix's distributed caching solution built on top of Memcached. It's the primary caching layer for nearly every Netflix service.

Interactive: Netflix Request Flow

Step through a complete request lifecycle — from a user opening the Netflix app to video playback starting:

▶ Netflix Request Flow

User opens app → Zuul API Gateway → Eureka discovers services → Hystrix wraps calls → backend services → EVCache → CDN serves video.

Open Connect — Netflix's Own CDN

While the control plane (APIs, recommendations, user profiles) runs on AWS, the data plane — the actual video bytes — is served by Netflix's proprietary CDN: Open Connect. This is perhaps the single most important piece of Netflix's architecture. It's the reason Netflix can serve 15% of global internet traffic without bankrupting itself on bandwidth costs.

Why Build a CDN?

In the early days, Netflix used third-party CDNs (Akamai, Limelight). But video has a unique characteristic that makes it different from web content: the files are enormous (a 4K movie is ~15 GB), the catalog is known in advance, and popularity follows a power-law distribution (a small fraction of titles account for most viewing hours). Netflix realized they could build a specialized CDN that exploits these properties far more efficiently than a general-purpose CDN.

Open Connect Appliances (OCAs)

An OCA is a custom-built server — essentially a high-density storage box running FreeBSD with a custom TCP stack and NGINX-based content server. Netflix designs the hardware themselves and ships these appliances to ISPs worldwide at no cost to the ISP.

OCA GenerationStorageThroughputRole
Flash OCA280 TB NVMe SSD100+ GbpsHigh-demand ISP sites, popular content
Storage OCA500+ TB HDD40 GbpsFull catalog storage, large ISPs
Combo OCAHDD + SSD tiered100 GbpsMixed workload, hot/cold data tiering

Content Positioning — The Fill Algorithm

Every night during off-peak hours, Netflix's fill algorithm runs a global optimization problem: given the popularity predictions for tomorrow, the storage capacity of each OCA, and the geographic distribution of subscribers, which content should be pre-positioned where?

Client Steering

When a subscriber hits play, the Netflix client calls the playback API (running on AWS), which returns an ordered list of OCA URLs to try. The steering algorithm considers:

  1. Network proximity: Prefer OCAs within the subscriber's own ISP (zero peering/transit costs).
  2. Server load: OCAs report their current throughput and connection counts. Overloaded boxes are ranked lower.
  3. Historical quality: If a particular OCA consistently delivers poor throughput to a client's network, it's deprioritized.
  4. Content availability: Obviously, the OCA must have the requested title and encoding variant.

The result: 95%+ of Netflix video bytes are served from OCAs inside the subscriber's own ISP — literally one network hop away. This is why Netflix streams start fast and rarely buffer, even on mediocre internet connections.

Interactive: Open Connect CDN

See how Netflix pre-positions content to ISP-embedded appliances and routes playback requests to the nearest OCA:

▶ Open Connect CDN

Off-peak: content fills from origin to OCA boxes at ISPs. Playback: user routes to nearest OCA instead of Netflix servers.

Content Encoding Pipeline

Before a single frame reaches a subscriber, Netflix's encoding pipeline transforms raw studio masters into an astonishing number of optimized streams. A single movie or episode is encoded into approximately 1,200+ different streams — combinations of resolutions, bitrates, codecs, and audio formats.

The Encoding Ladder

Traditionally, video services used a fixed encoding ladder — a static set of bitrate/resolution pairs applied to all content. Netflix revolutionized this with per-title encoding (later evolved to per-shot encoding).

The insight: a slow dialogue scene can look pristine at 500 Kbps, while a fast-action sequence needs 8 Mbps for the same perceived quality. Applying one bitrate ladder to both wastes either bandwidth or quality.

Pipeline Architecture

Studio Master (ProRes/IMF, ~300 GB per hour)
  │
  ├─→ Quality Analysis: Scene detection, complexity scoring, VMAF targets
  │
  ├─→ Encoding Farm (100,000+ cores)
  │     ├── H.264/AVC (legacy devices)
  │     ├── H.265/HEVC (4K, HDR)
  │     ├── VP9 (Chrome, Android)
  │     ├── AV1 (next-gen, 20% better compression)
  │     ├── Dolby Vision (premium HDR)
  │     └── HDR10 / HDR10+
  │
  ├─→ Audio Encoding
  │     ├── AAC Stereo (mobile)
  │     ├── Dolby Digital 5.1 (TV)
  │     ├── Dolby Atmos (premium)
  │     └── 30+ language tracks per title
  │
  ├─→ Subtitle/Caption Generation (30+ languages)
  │
  ├─→ Packaging: DASH / HLS manifests, DRM encryption (Widevine, FairPlay, PlayReady)
  │
  └─→ Distribution: Push to Open Connect origin, trigger OCA fill

The encoding farm is built on Apache Titus — Netflix's container management platform running on AWS EC2. Encoding jobs are massively parallelized: a single movie is split into hundreds of chunks encoded simultaneously, then reassembled. A new title can go from studio master to globally available in under 4 hours.

VMAF (Video Multi-Method Assessment Fusion): Netflix open-sourced VMAF in 2016. Unlike traditional metrics like PSNR or SSIM, VMAF is trained on human perception data — it predicts how a human viewer would rate quality on a 0-100 scale. Netflix targets VMAF scores of 93+ for 1080p and 95+ for 4K content. VMAF has become an industry standard used by YouTube, Facebook, and most major streaming services.

Chaos Engineering

Netflix didn't just accept that failures happen — they deliberately inject failures into production systems to ensure resilience. This practice, which Netflix pioneered, is now known as chaos engineering.

The philosophy: if your system can survive random destruction during business hours, it can survive anything that happens at 3 AM on a Saturday.

The Simian Army

ToolWhat It DestroysWhy
Chaos MonkeyRandomly terminates individual EC2 instances during business hoursEnsures every service handles instance failure gracefully
Chaos GorillaSimulates an entire Availability Zone failureValidates AZ-level redundancy and failover
Chaos KongSimulates an entire AWS region going offlineTests cross-region failover (the nuclear option)
Latency MonkeyInjects artificial latency into RESTful callsExposes timeout and retry bugs
Conformity MonkeyFinds and shuts down instances that don't conform to best practicesEnforces architectural standards
Janitor MonkeyCleans up unused resources (orphaned instances, unattached volumes)Reduces cloud waste and cost

Chaos Kong — Regional Failover

Chaos Kong is the most impressive (and terrifying) chaos exercise. Netflix runs production traffic in three AWS regions. During a Chaos Kong exercise, they evacuate an entire region — redirecting all traffic to the remaining two. This validates:

Netflix runs Chaos Kong exercises monthly. The fact that they can evacuate an entire region during peak hours, with subscribers never noticing, is a testament to their architecture's resilience.

ChAP — Chaos Automation Platform

As chaos engineering matured, Netflix built ChAP (Chaos Automation Platform) to make experiments more scientific. ChAP runs controlled experiments: it takes a portion of production traffic, divides it into control and experiment groups, injects a failure into the experiment group, and measures the difference in SPS (Stream Starts Per Second) — Netflix's primary business metric.

If the experiment group shows statistically significant degradation in SPS, the failure mode is flagged for engineering follow-up. This transformed chaos engineering from "break things and see what happens" into a rigorous, data-driven discipline.

Data Architecture

Netflix's data architecture follows a polyglot persistence strategy — each data store is chosen for the specific access patterns and consistency requirements of its use case.

Apache Cassandra — The Primary Workhorse

Cassandra is Netflix's most widely used database, running thousands of clusters with tens of petabytes of data. Netflix chose Cassandra for its:

Netflix uses Cassandra for: viewing history, playback bookmarks, user profiles, content metadata, A/B test results, and device registrations.

MySQL (CockroachDB Migration) — Billing & Transactions

Billing requires ACID transactions — exactly-once payment processing, idempotent subscription changes, and audit-grade consistency. MySQL (via AWS RDS) handles Netflix's billing database with: read replicas for reporting, automated backups, and point-in-time recovery. Netflix has also explored CockroachDB for globally distributed transactional workloads.

Elasticsearch — Search & Discovery

Netflix's search functionality is powered by Elasticsearch clusters running hundreds of nodes. The search index includes:

Apache Kafka — Event Streaming Backbone

Kafka is the central nervous system of Netflix's data architecture. Every significant event — a play button press, a search query, a recommendation impression, a signup, a billing event — is published to Kafka.

Netflix Data Flow (simplified):
┌────────────┐
│  User App  │──play, search, browse events──→ ┌─────────┐
│  (Device)  │                                  │  Kafka  │
└────────────┘                                  │ Cluster │
                                                └────┬────┘
            ┌────────────────────────────────────────┼────────────────┐
            │                    │                   │                │
            ▼                    ▼                   ▼                ▼
    ┌──────────────┐   ┌──────────────┐   ┌──────────────┐  ┌──────────────┐
    │  S3 / Iceberg│   │Elasticsearch │   │  Cassandra   │  │ Spark/Flink  │
    │  (Data Lake) │   │  (Search)    │   │  (Views)     │  │ (Streaming)  │
    └──────────────┘   └──────────────┘   └──────────────┘  └──────────────┘
            │
            ▼
    ┌──────────────┐
    │  Presto /    │  ← Ad-hoc analytics, dashboards, ML training
    │  Spark SQL   │
    └──────────────┘

Apache Iceberg — Data Lake Table Format

Netflix created Apache Iceberg (donated to Apache Foundation) to solve the limitations of Hive table format for their massive data lake. Iceberg provides ACID transactions on petabyte-scale tables, time-travel queries, schema evolution without rewriting data, and efficient partition pruning. Netflix's data lake on S3 stores hundreds of petabytes of analytics, ML training data, and operational logs.

Recommendation Engine

Netflix estimates that their recommendation system saves them $1 billion per year in reduced churn. Over 80% of content watched on Netflix is discovered through recommendations, not search.

Architecture Overview

The recommendation system is a multi-layered ML pipeline:

  1. Candidate generation: Broad models (collaborative filtering, content-based filtering) generate thousands of candidate titles from the catalog. This runs offline in batch on Spark.
  2. Ranking: A deep learning model ranks candidates by predicted engagement. Features include: viewing history, time of day, device type, genre preferences, "taste communities" (clusters of users with similar preferences), title freshness, and social signals.
  3. Row generation: The Netflix homepage is organized into rows ("Because you watched X", "Trending Now", "New Releases"). A separate model decides which rows to show, in what order, with which titles — personalized for each subscriber.
  4. Artwork personalization: Even the thumbnail image shown for each title is personalized. A user who watches lots of comedies sees a funny scene from a drama; a romance fan sees the romantic subplot. Netflix found this increased click-through rates by 20-30%.

ML Infrastructure

Adaptive Bitrate Streaming (ABR)

Once the video bytes leave an OCA, the Netflix client takes over. The Adaptive Bitrate (ABR) algorithm dynamically adjusts video quality based on real-time network conditions — the reason you rarely see buffering on Netflix.

How ABR Works

  1. Buffer-based control: The client maintains a playback buffer (typically 30-120 seconds of video). The ABR algorithm monitors the buffer level and available bandwidth.
  2. Quality selection: If the buffer is healthy (>60s) and bandwidth is stable, the client requests higher-quality chunks. If the buffer is draining, it drops to a lower bitrate immediately — prioritizing continuity over quality.
  3. Per-title bitrate ladders: Because each title has a custom encoding ladder (from per-title optimization), the ABR algorithm switches between quality levels that are already optimized for that specific content. A nature documentary's "medium" quality might look better than a generic "high" quality.
  4. Device-aware profiles: A mobile client on a 5-inch screen caps at 1080p (sending 4K would waste bandwidth and battery). A 65-inch TV targets 4K/HDR. The ABR algorithm operates within device-appropriate bounds.
  5. Predictive buffering: Netflix's client analyzes network patterns (e.g., your WiFi is consistently slower at 8 PM when neighbors are online) and pre-buffers aggressively during good conditions to survive future quality drops.

Real-World Impact

Netflix's combination of per-title encoding + Open Connect + ABR achieves remarkable efficiency:

Lessons Learned

Netflix's architecture offers several broadly applicable lessons for distributed systems engineering:

1. Optimize for the Failure Case

Netflix's entire architecture assumes things will fail. Rather than trying to prevent all failures (impossible at scale), they invest in graceful degradation. Every service has a fallback. Every region can absorb another's traffic. Chaos engineering is not optional — it's part of the engineering process.

2. Own Your Critical Path

Netflix uses AWS for compute but built their own CDN. Why? Because video delivery is their core business. Relying on a third-party CDN meant they couldn't optimize for video-specific patterns, couldn't control costs at their scale, and couldn't customize the hardware. The lesson: buy commodity infrastructure, build differentiating infrastructure.

3. Polyglot Persistence is Worth the Complexity

Using Cassandra, MySQL, Elasticsearch, and Kafka (each for its strengths) is operationally complex but delivers far better performance than forcing everything into one database. The key is having strong platform teams that abstract away the operational burden.

4. Migrate Incrementally

The 8-year monolith-to-microservices migration succeeded because each step was independently valuable. They never had a "all or nothing" moment. The strangler fig pattern allowed them to ship features continuously while migrating.

5. Invest in Developer Productivity

Netflix invested heavily in internal developer tools: Spinnaker (continuous delivery), Atlas (metrics), Mantis (stream processing), and centralized logging. When developers can deploy, monitor, and debug independently, velocity compounds. Netflix engineers deploy thousands of times per day across all services.

6. Data-Driven Everything

Every decision — from which thumbnail to show, to which encoding bitrate to use, to which OCA to route to — is driven by data and A/B testing. Netflix's culture of experimentation means that even strongly-held engineering opinions are validated with data before being adopted.

Summary