Design: YouTube / Video Streaming
Requirements
YouTube is the world's second-largest search engine and the dominant video-sharing platform, serving over 2 billion logged-in users per month and streaming more than 1 billion hours of video per day. Designing a system at this scale forces you to confront nearly every distributed-systems challenge simultaneously: massive storage, extreme bandwidth, real-time transcoding, global distribution, and ML-powered recommendations.
Functional Requirements
- Upload videos — creators upload video files (up to 256 GB, 12 hours) in any common format (MP4, MOV, AVI, MKV, WebM)
- Stream videos — viewers watch on-demand with adaptive quality, seeking, and subtitles
- Search videos — full-text search across titles, descriptions, tags, and auto-generated captions
- Recommendations — personalized homepage feed, "up next" sidebar, trending content
- Comments & replies — threaded comments on every video
- Likes, dislikes, subscriptions — engagement signals and creator subscriptions
- Live streaming — real-time broadcast with chat (stretch goal)
Non-Functional Requirements
- Scale — 2B monthly active users, 500+ hours of video uploaded per minute
- Availability — 99.99% uptime (≤ 52 min downtime/year)
- Latency — video playback start < 2 seconds, search results < 200 ms
- Durability — zero data loss on uploaded content (11 nines durability)
- Global reach — low-latency streaming on every continent
- Eventual consistency — view counts, like counts need not be real-time
Back-of-the-Envelope Estimation
| Metric | Estimate |
|---|---|
| Videos uploaded per day | ~720,000 (500 hrs/min × 60 min × 24 hrs, avg 1 min each) |
| Average original video size | ~500 MB (1080p, 5 min, H.264) |
| Daily raw upload storage | ~360 TB/day (720K × 500 MB) |
| Transcoded storage multiplier | ~5× (multiple resolutions + codecs) |
| Daily total storage | ~1.8 PB/day |
| Daily video watches | ~5 billion |
| Peak concurrent streams | ~100 million |
| Peak egress bandwidth | ~500 Tbps (100M × 5 Mbps avg) |
High-Level Architecture
The system decomposes into several key subsystems, each operating as an independent microservice cluster:
┌──────────────┐ ┌───────────────┐ ┌──────────────────┐
│ Clients │────▶│ API Gateway │────▶│ Auth Service │
│ (Web/Mobile/ │ │ (Rate Limit, │ │ (OAuth, JWT) │
│ Smart TV) │ │ Routing) │ └──────────────────┘
└──────┬───────┘ └───────┬───────┘
│ │
│ ┌────────────────┼────────────────────┐
│ │ │ │
│ ▼ ▼ ▼
│ ┌──────────┐ ┌──────────┐ ┌──────────────┐
│ │ Upload │ │ Video │ │ Search │
│ │ Service │ │ Service │ │ Service │
│ └────┬─────┘ └────┬─────┘ └──────┬───────┘
│ │ │ │
│ ▼ ▼ ▼
│ ┌──────────┐ ┌──────────┐ ┌──────────────┐
│ │Transcoder│ │ Metadata │ │Elasticsearch │
│ │ Workers │ │ DB │◀────▶│ Cluster │
│ └────┬─────┘ └──────────┘ └──────────────┘
│ │
│ ▼
│ ┌──────────┐ ┌──────────────────────┐
│ │ Blob │────▶│ CDN Edge Servers │
│ │ Storage │ │ (Global, 200+ PoPs) │
│ └──────────┘ └──────────────────────┘
│ │
└────────────────────────────┘
(stream from nearest edge)
Async Services:
┌────────────┐ ┌───────────────┐ ┌──────────────┐ ┌────────────┐
│ Comment │ │ Recommendation│ │ View Counter │ │Notification│
│ Service │ │ Engine │ │ Service │ │ Service │
└────────────┘ └───────────────┘ └──────────────┘ └────────────┘
Key design principles:
- Separation of upload and streaming paths — upload is write-heavy and CPU-intensive (transcoding); streaming is read-heavy and bandwidth-intensive (CDN)
- Asynchronous processing — video processing happens in background workers via message queues
- Pre-computation — transcode into all formats upfront rather than on-demand
- CDN-first delivery — origin servers should rarely serve video data directly to clients
Video Upload Pipeline
The upload pipeline is the most complex write path in the system. It transforms a raw video file from a creator into a fully processed, globally distributed, streamable asset. The pipeline must handle 500+ hours of video per minute reliably.
Step-by-Step Upload Flow
- Client-side pre-processing — the client computes a SHA-256 hash of the file for deduplication and integrity verification, then initiates a resumable upload via the Google Resumable Upload Protocol (or tus.io)
- Upload to API server — the file is streamed directly to blob storage (bypassing the API server's memory) via a pre-signed upload URL. The API server only handles metadata
- Store original in blob storage — the raw file lands in a "raw uploads" bucket in S3/GCS with 3× replication across availability zones
- Enqueue transcoding jobs — the upload service publishes a message to Kafka/SQS with the video ID, storage location, and requested output formats
- Parallel transcoding — a fleet of transcoding workers (running FFmpeg/libx264/libvpx) picks up the job and produces multiple renditions in parallel
- Generate thumbnails — extract key frames at regular intervals (every 2 seconds) for the video scrubber, plus 3 candidate thumbnails for the creator to choose from
- Extract metadata — run speech-to-text for auto-captions, detect language, extract audio features for Content ID matching
- Update metadata DB — mark the video as "processed" with pointers to all renditions and thumbnails
- Push to CDN — proactively push popular-origin content to edge nodes; for long-tail content, CDN pulls on first request
- Notify creator — send push notification / email that processing is complete
▶ Video Upload Pipeline
Watch a video flow from upload through parallel transcoding to CDN distribution.
Resumable Uploads
Uploading a 4 GB video over a flaky mobile connection is unreliable. YouTube uses resumable uploads — the client sends the file in chunks (typically 8–16 MB each), and the server tracks how many bytes have been received. If the connection drops, the client queries the server for the last confirmed byte offset and resumes from there.
POST /upload/resumable HTTP/1.1
Host: upload.youtube.com
Content-Type: application/json
X-Upload-Content-Type: video/mp4
X-Upload-Content-Length: 4294967296
{ "title": "My Video", "description": "...", "privacyStatus": "public" }
--- Response ---
HTTP/1.1 200 OK
Location: https://upload.youtube.com/upload?id=xa298sd_sdlkj2
--- Resume with ---
PUT https://upload.youtube.com/upload?id=xa298sd_sdlkj2
Content-Range: bytes 16777216-33554431/4294967296
Content-Length: 16777216
[binary chunk data]
Transcoding Architecture
Transcoding is the most CPU-intensive operation in the pipeline. Each uploaded video must be converted into multiple resolutions and multiple codecs:
| Resolution | Bitrate (H.264) | Bitrate (VP9) | Bitrate (AV1) | Use Case |
|---|---|---|---|---|
| 240p (426×240) | 300 kbps | 150 kbps | 100 kbps | 2G/slow connections |
| 360p (640×360) | 700 kbps | 350 kbps | 250 kbps | Mobile, low bandwidth |
| 480p (854×480) | 1.5 Mbps | 750 kbps | 500 kbps | Standard mobile |
| 720p (1280×720) | 3 Mbps | 1.5 Mbps | 1 Mbps | HD default |
| 1080p (1920×1080) | 6 Mbps | 3 Mbps | 2 Mbps | Desktop, tablets |
| 1440p (2560×1440) | 10 Mbps | 6 Mbps | 4 Mbps | QHD monitors |
| 4K (3840×2160) | 20 Mbps | 12 Mbps | 8 Mbps | 4K TVs, premium |
Parallel Transcoding with DAG Processing
YouTube doesn't process transcoding jobs sequentially. Instead, the pipeline is modeled as a Directed Acyclic Graph (DAG) of tasks:
Original Video (raw upload)
│
├──▶ [Video Split into Segments] ──────────────────┐
│ (GOP-aligned, ~4s segments) │
│ │
│ ┌──────────── Parallel per segment ───────────┤
│ │ │
│ ├──▶ Transcode → 240p H.264 │
│ ├──▶ Transcode → 360p H.264 │
│ ├──▶ Transcode → 480p H.264 │
│ ├──▶ Transcode → 720p H.264 │
│ ├──▶ Transcode → 1080p H.264 │
│ ├──▶ Transcode → 720p VP9 │
│ ├──▶ Transcode → 1080p VP9 │
│ ├──▶ Transcode → 4K VP9 │
│ └──▶ Transcode → 1080p AV1 (if popular) │
│ │
│ └──────────── Merge segments ─────────────────┘
│ │
├──▶ [Extract Audio] ──▶ AAC 128kbps, Opus 64kbps
│
├──▶ [Generate Thumbnails] ──▶ every 2s + 3 candidates
│
├──▶ [Speech-to-Text] ──▶ auto-captions (VTT/SRT)
│
└──▶ [Content ID Fingerprint] ──▶ copyright check
The key optimization is segment-level parallelism. A 10-minute video at 30 fps is split into ~150 four-second segments. Each segment is transcoded independently across the worker fleet. A 10-minute video that would take 45 minutes to transcode sequentially on a single machine can be completed in under 2 minutes with 150 parallel workers.
# FFmpeg command to split a video into segments at GOP boundaries
ffmpeg -i input.mp4 -c copy -f segment \
-segment_time 4 -reset_timestamps 1 \
-segment_list segments.csv \
segment_%04d.mp4
# Transcode a single segment to 720p H.264
ffmpeg -i segment_0042.mp4 \
-vf "scale=1280:720" \
-c:v libx264 -preset medium -crf 23 \
-c:a aac -b:a 128k \
-movflags +faststart \
segment_0042_720p.mp4
# Transcode to VP9 (better compression, slower encode)
ffmpeg -i segment_0042.mp4 \
-vf "scale=1280:720" \
-c:v libvpx-vp9 -b:v 1500k -minrate 750k -maxrate 2100k \
-c:a libopus -b:a 64k \
segment_0042_720p.webm
# Transcode to AV1 (best compression, slowest encode)
ffmpeg -i segment_0042.mp4 \
-vf "scale=1280:720" \
-c:v libaom-av1 -crf 30 -b:v 0 -strict experimental \
-c:a libopus -b:a 64k \
segment_0042_720p_av1.mp4
Adaptive Bitrate Streaming (ABR)
The defining innovation that makes video streaming work over variable-bandwidth internet connections is Adaptive Bitrate Streaming (ABR). Instead of downloading a single file, the client dynamically selects the best quality level for each segment based on current network conditions.
How ABR Works
- The client requests a manifest file (playlist) that describes all available quality levels and segment URLs
- The client starts playing at a conservative quality (e.g., 480p)
- After each segment download, the client measures actual throughput
- If bandwidth is sufficient, the next segment is requested at a higher quality
- If bandwidth drops, the client immediately switches to a lower quality — seamlessly, mid-stream
- The player maintains a buffer of 15–30 seconds ahead to absorb transient network fluctuations
HLS (HTTP Live Streaming)
Apple's HLS is the most widely supported ABR protocol. It uses a master playlist (m3u8) pointing to variant playlists for each quality level, each of which lists the individual segment files (.ts or .fmp4).
#EXTM3U
#EXT-X-VERSION:6
## Master Playlist — lists all available quality levels
## Client picks the highest quality its bandwidth can sustain
#EXT-X-STREAM-INF:BANDWIDTH=300000,RESOLUTION=426x240,CODECS="avc1.42e00a,mp4a.40.2"
https://cdn.example.com/video/abc123/240p/playlist.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=700000,RESOLUTION=640x360,CODECS="avc1.42e01e,mp4a.40.2"
https://cdn.example.com/video/abc123/360p/playlist.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=1500000,RESOLUTION=854x480,CODECS="avc1.4d401f,mp4a.40.2"
https://cdn.example.com/video/abc123/480p/playlist.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=3000000,RESOLUTION=1280x720,CODECS="avc1.4d4020,mp4a.40.2"
https://cdn.example.com/video/abc123/720p/playlist.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=6000000,RESOLUTION=1920x1080,CODECS="avc1.640028,mp4a.40.2"
https://cdn.example.com/video/abc123/1080p/playlist.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=12000000,RESOLUTION=2560x1440,CODECS="avc1.640032,mp4a.40.2"
https://cdn.example.com/video/abc123/1440p/playlist.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=20000000,RESOLUTION=3840x2160,CODECS="avc1.640033,mp4a.40.2"
https://cdn.example.com/video/abc123/4k/playlist.m3u8
## Variant Playlist for 720p — individual segments
#EXTM3U
#EXT-X-VERSION:6
#EXT-X-TARGETDURATION:4
#EXT-X-MEDIA-SEQUENCE:0
#EXT-X-PLAYLIST-TYPE:VOD
#EXTINF:4.000,
https://cdn.example.com/video/abc123/720p/seg_0000.ts
#EXTINF:4.000,
https://cdn.example.com/video/abc123/720p/seg_0001.ts
#EXTINF:4.000,
https://cdn.example.com/video/abc123/720p/seg_0002.ts
#EXTINF:4.000,
https://cdn.example.com/video/abc123/720p/seg_0003.ts
#EXTINF:3.840,
https://cdn.example.com/video/abc123/720p/seg_0004.ts
#EXT-X-ENDLIST
MPEG-DASH (Dynamic Adaptive Streaming over HTTP)
DASH is the open standard alternative to HLS. It uses an XML-based Media Presentation Description (MPD) instead of m3u8 playlists. YouTube uses DASH internally for most playback.
<?xml version="1.0" encoding="UTF-8"?>
<MPD xmlns="urn:mpeg:dash:schema:mpd:2011"
type="static"
mediaPresentationDuration="PT10M24S"
minBufferTime="PT4S">
<Period duration="PT10M24S">
<!-- Video Adaptation Set —>
<AdaptationSet mimeType="video/mp4" segmentAlignment="true">
<Representation id="240p" width="426" height="240"
bandwidth="300000" codecs="avc1.42e00a">
<SegmentTemplate media="240p/seg_$Number%04d$.m4s"
initialization="240p/init.mp4"
duration="4000" timescale="1000"/>
</Representation>
<Representation id="720p" width="1280" height="720"
bandwidth="3000000" codecs="avc1.4d4020">
<SegmentTemplate media="720p/seg_$Number%04d$.m4s"
initialization="720p/init.mp4"
duration="4000" timescale="1000"/>
</Representation>
<Representation id="1080p" width="1920" height="1080"
bandwidth="6000000" codecs="avc1.640028">
<SegmentTemplate media="1080p/seg_$Number%04d$.m4s"
initialization="1080p/init.mp4"
duration="4000" timescale="1000"/>
</Representation>
<Representation id="4k" width="3840" height="2160"
bandwidth="20000000" codecs="avc1.640033">
<SegmentTemplate media="4k/seg_$Number%04d$.m4s"
initialization="4k/init.mp4"
duration="4000" timescale="1000"/>
</Representation>
</AdaptationSet>
<!-- Audio Adaptation Set —>
<AdaptationSet mimeType="audio/mp4" lang="en">
<Representation id="audio_128k" bandwidth="128000"
codecs="mp4a.40.2">
<SegmentTemplate media="audio/seg_$Number%04d$.m4s"
initialization="audio/init.mp4"
duration="4000" timescale="1000"/>
</Representation>
</AdaptationSet>
</Period>
</MPD>
▶ Adaptive Bitrate Streaming
Watch how the client dynamically switches quality levels as bandwidth fluctuates.
ABR Algorithm: Buffer-Based vs. Rate-Based
The client-side ABR algorithm decides which quality to request next. Two dominant approaches exist:
| Algorithm | How It Works | Pros | Cons |
|---|---|---|---|
| Rate-based | Estimate bandwidth from recent segment download times, pick highest quality below estimated bandwidth | Fast adaptation | Oscillation when bandwidth fluctuates |
| Buffer-based (BBA) | Map buffer level to quality: low buffer → low quality, high buffer → high quality | Stable, avoids oscillation | Slower to ramp up |
| Hybrid (MPC) | Model Predictive Control — uses both throughput prediction and buffer state to optimize a lookahead window | Best quality of experience | Complex, higher CPU on client |
# Simplified buffer-based ABR logic (pseudo-Python)
class BufferBasedABR:
RESERVOIR = 5 # seconds — below this, use lowest quality
CUSHION = 15 # seconds — linear ramp between reservoir and cushion
MAX_BUFFER = 30 # seconds — above this, use highest quality
def __init__(self, quality_levels):
self.levels = sorted(quality_levels, key=lambda q: q.bitrate)
def select_quality(self, buffer_level_sec, available_bandwidth_bps):
if buffer_level_sec < self.RESERVOIR:
return self.levels[0] # emergency: lowest quality
if buffer_level_sec > self.MAX_BUFFER:
return self.levels[-1] # buffer is full: max quality
# Linear interpolation between lowest and highest
fraction = (buffer_level_sec - self.RESERVOIR) / self.CUSHION
index = int(fraction * (len(self.levels) - 1))
candidate = self.levels[min(index, len(self.levels) - 1)]
# Safety check: don't exceed measured bandwidth
for level in reversed(self.levels[:index + 1]):
if level.bitrate < available_bandwidth_bps * 0.8:
return level
return self.levels[0]
Video Storage
YouTube's storage system must manage petabytes of new data per day across two tiers: the original uploads (for potential re-transcoding) and the transcoded renditions (for actual serving).
Blob Storage Architecture
Storage Layout:
gs://youtube-raw/
└── {video_id}/
└── original.mp4 (raw upload, immutable)
gs://youtube-transcoded/
└── {video_id}/
├── h264/
│ ├── 240p/
│ │ ├── init.mp4
│ │ ├── seg_0000.m4s
│ │ ├── seg_0001.m4s
│ │ └── ...
│ ├── 480p/
│ ├── 720p/
│ ├── 1080p/
│ └── 4k/
├── vp9/
│ ├── 720p/
│ ├── 1080p/
│ └── 4k/
├── av1/
│ └── 1080p/
├── audio/
│ ├── aac_128k/
│ └── opus_64k/
├── thumbnails/
│ ├── sprite_sheet.jpg (scrubber thumbnails)
│ ├── thumb_001.jpg
│ ├── thumb_002.jpg
│ └── thumb_003.jpg
└── captions/
├── en_auto.vtt
└── en_manual.vtt
Storage Tiering
Not all videos are accessed equally. YouTube employs tiered storage to balance cost and performance:
| Tier | Access Pattern | Storage Class | Cost ($/GB/month) |
|---|---|---|---|
| Hot | Viral/trending, >10K views/day | Standard (multi-region) | $0.026 |
| Warm | Active, 100–10K views/day | Nearline | $0.010 |
| Cold | Long-tail, <100 views/day | Coldline | $0.004 |
| Archive | Rarely accessed, raw originals | Archive | $0.0012 |
Metadata & Database Design
While video bytes live in blob storage, all metadata about videos, users, channels, and interactions lives in a relational database (YouTube uses a custom-sharded MySQL system called Vitess).
Core Schema
-- Videos table (sharded by video_id)
CREATE TABLE videos (
video_id BIGINT PRIMARY KEY, -- snowflake ID
channel_id BIGINT NOT NULL,
title VARCHAR(100) NOT NULL,
description TEXT,
upload_time TIMESTAMP NOT NULL,
duration_sec INT NOT NULL,
status ENUM('processing','active','removed','flagged'),
privacy ENUM('public','unlisted','private'),
category_id SMALLINT,
language CHAR(5),
storage_bucket VARCHAR(255), -- GCS path prefix
thumbnail_url VARCHAR(512),
view_count BIGINT DEFAULT 0, -- eventually consistent
like_count INT DEFAULT 0,
dislike_count INT DEFAULT 0,
comment_count INT DEFAULT 0,
INDEX idx_channel (channel_id),
INDEX idx_upload_time (upload_time),
INDEX idx_status (status)
) ENGINE=InnoDB;
-- Channels table (sharded by channel_id)
CREATE TABLE channels (
channel_id BIGINT PRIMARY KEY,
user_id BIGINT NOT NULL,
name VARCHAR(100) NOT NULL,
description TEXT,
subscriber_count BIGINT DEFAULT 0,
created_at TIMESTAMP NOT NULL,
avatar_url VARCHAR(512),
banner_url VARCHAR(512),
country CHAR(2),
INDEX idx_user (user_id)
) ENGINE=InnoDB;
-- Video renditions (which transcoded versions exist)
CREATE TABLE video_renditions (
video_id BIGINT NOT NULL,
codec ENUM('h264','vp9','av1'),
resolution ENUM('240p','360p','480p','720p','1080p','1440p','4k'),
bitrate_kbps INT NOT NULL,
segment_count INT NOT NULL,
storage_path VARCHAR(512) NOT NULL,
file_size_bytes BIGINT NOT NULL,
PRIMARY KEY (video_id, codec, resolution),
FOREIGN KEY (video_id) REFERENCES videos(video_id)
) ENGINE=InnoDB;
-- Subscriptions (sharded by subscriber_id for "my subscriptions" queries)
CREATE TABLE subscriptions (
subscriber_id BIGINT NOT NULL,
channel_id BIGINT NOT NULL,
subscribed_at TIMESTAMP NOT NULL,
notifications BOOLEAN DEFAULT TRUE,
PRIMARY KEY (subscriber_id, channel_id),
INDEX idx_channel_subs (channel_id)
) ENGINE=InnoDB;
-- Watch history (sharded by user_id, for recommendations)
CREATE TABLE watch_history (
user_id BIGINT NOT NULL,
video_id BIGINT NOT NULL,
watched_at TIMESTAMP NOT NULL,
watch_duration INT NOT NULL, -- seconds actually watched
watch_percent DECIMAL(5,2), -- % of video watched
PRIMARY KEY (user_id, video_id, watched_at),
INDEX idx_recent (user_id, watched_at DESC)
) ENGINE=InnoDB;
Sharding Strategy
With billions of rows, a single database instance cannot handle the load. YouTube shards using Vitess:
- Videos — sharded by
video_id(hash-based, 256+ shards) - Users/Channels — sharded by
user_idorchannel_id - Watch history — sharded by
user_id(all history for one user on one shard) - Comments — sharded by
video_id(all comments for one video on one shard)
Cross-shard queries (e.g., "all videos by a channel") are handled by maintaining secondary indexes that map channel_id → [video_ids] on the channel's shard, pointing to the actual video data on video shards.
Video Search
YouTube search must handle billions of queries per day across a corpus of 800+ million videos. The search system combines traditional information retrieval with machine learning ranking.
Search Architecture
User Query: "how to make pasta"
│
▼
┌────────────────┐
│ Query Parser │ → tokenize, spell-correct, expand synonyms
└───────┬────────┘
▼
┌────────────────┐
│ Elasticsearch │ → retrieve top ~1000 candidate videos
│ (Inverted Index│ using BM25 on title + description + tags
│ + Sharded) │ + auto-captions
└───────┬────────┘
▼
┌────────────────┐
│ ML Re-Ranker │ → neural network re-ranks candidates using:
│ (Deep Model) │ - text relevance score
│ │ - video quality signals (watch time, CTR)
│ │ - freshness / recency
│ │ - user personalization (watch history)
│ │ - engagement metrics
└───────┬────────┘
▼
┌────────────────┐
│ Result Blender│ → mix in ads, Knowledge Panels, playlists
└───────┬────────┘
▼
Search Results (typically top 20)
Elasticsearch Index Structure
PUT /youtube_videos
{
"settings": {
"number_of_shards": 64,
"number_of_replicas": 2,
"analysis": {
"analyzer": {
"youtube_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": ["lowercase", "stop", "snowball", "synonym_filter"]
}
},
"filter": {
"synonym_filter": {
"type": "synonym",
"synonyms": [
"tutorial,howto,guide",
"vlog,daily vlog,day in my life"
]
}
}
}
},
"mappings": {
"properties": {
"video_id": { "type": "keyword" },
"title": { "type": "text", "analyzer": "youtube_analyzer",
"boost": 3.0 },
"description": { "type": "text", "analyzer": "youtube_analyzer" },
"tags": { "type": "text", "analyzer": "youtube_analyzer",
"boost": 2.0 },
"captions": { "type": "text", "analyzer": "youtube_analyzer" },
"channel_name": { "type": "text", "boost": 1.5 },
"category": { "type": "keyword" },
"language": { "type": "keyword" },
"upload_date": { "type": "date" },
"duration_sec": { "type": "integer" },
"view_count": { "type": "long" },
"like_ratio": { "type": "float" },
"avg_watch_pct": { "type": "float" },
"subscriber_count": { "type": "long" }
}
}
}
Recommendation Engine
Recommendations drive over 70% of total watch time on YouTube. The recommendation system is arguably the most valuable component of the entire platform.
Two-Stage Architecture
800+ Million Videos
│
▼
┌───────────────────────┐
│ Candidate Generation │ → narrow to ~1000 candidates
│ (fast, recall-focused)│
│ │
│ Techniques: │
│ • Collaborative │
│ filtering │
│ • Content-based │
│ similarity │
│ • User embedding │
│ nearest neighbors │
└───────────┬───────────┘
▼
~1000 candidates
│
▼
┌───────────────────────┐
│ Ranking Model │ → score & rank to top ~50
│ (slower, precision- │
│ focused) │
│ │
│ Features: │
│ • Watch history │
│ • Time since upload │
│ • Video length │
│ • Channel affinity │
│ • Device type │
│ • Time of day │
│ • Language match │
│ • Past CTR for │
│ similar videos │
└───────────┬───────────┘
▼
Ranked Recommendations
(homepage, sidebar, end screen)
Collaborative Filtering
At its core: "Users who watched X also watched Y." YouTube builds a massive user-video interaction matrix and factors it into user embeddings and video embeddings using matrix factorization:
# Simplified collaborative filtering with implicit feedback
# Using Alternating Least Squares (ALS)
import numpy as np
class ALSRecommender:
def __init__(self, n_users, n_videos, n_factors=128, reg=0.01):
self.user_factors = np.random.normal(0, 0.01, (n_users, n_factors))
self.video_factors = np.random.normal(0, 0.01, (n_videos, n_factors))
self.reg = reg
def train(self, interactions, n_iterations=15):
"""
interactions: sparse matrix (user × video)
where value = confidence = 1 + α × watch_time
"""
for iteration in range(n_iterations):
# Fix video factors, solve for user factors
self._als_step(interactions, self.user_factors,
self.video_factors)
# Fix user factors, solve for video factors
self._als_step(interactions.T, self.video_factors,
self.user_factors)
def recommend(self, user_id, n=20):
scores = self.user_factors[user_id] @ self.video_factors.T
top_indices = np.argsort(scores)[::-1][:n]
return top_indices, scores[top_indices]
Deep Learning Ranking
The ranking stage uses a deep neural network (YouTube's 2016 paper describes this architecture, and it has evolved significantly since). The model predicts expected watch time rather than click probability — this avoids promoting clickbait:
# Simplified YouTube deep ranking model
# Predicts expected watch time for (user, video) pair
import torch
import torch.nn as nn
class YouTubeRankingModel(nn.Module):
def __init__(self, n_video_features, n_user_features, embed_dim=64):
super().__init__()
# Feature processing
self.video_mlp = nn.Sequential(
nn.Linear(n_video_features, 256),
nn.ReLU(),
nn.BatchNorm1d(256),
nn.Linear(256, embed_dim)
)
self.user_mlp = nn.Sequential(
nn.Linear(n_user_features, 256),
nn.ReLU(),
nn.BatchNorm1d(256),
nn.Linear(256, embed_dim)
)
# Ranking head — predicts watch time
self.ranking_head = nn.Sequential(
nn.Linear(embed_dim * 2 + 32, 128),
nn.ReLU(),
nn.Dropout(0.2),
nn.Linear(128, 64),
nn.ReLU(),
nn.Linear(64, 1) # predicted watch time in seconds
)
# Context features (time of day, device, etc.)
self.context_embed = nn.Linear(8, 32)
def forward(self, video_features, user_features, context):
v = self.video_mlp(video_features)
u = self.user_mlp(user_features)
c = self.context_embed(context)
combined = torch.cat([v, u, c], dim=1)
return self.ranking_head(combined).squeeze()
CDN & Content Delivery
YouTube operates one of the largest CDN networks in the world — Google's Global Cache (GGC) — with cache nodes in over 200 countries and thousands of ISP networks. The CDN is not optional; it is the most critical component for user experience.
CDN Architecture
┌─────────────┐ Cache Miss ┌──────────────┐
│ Origin │◀────────────────────│ Regional │
│ Storage │ │ Cache (PoP) │
│ (GCS/S3) │────────────────────▶│ (L2 Cache) │
│ │ Fill Response │ │
└─────────────┘ └──────┬───────┘
│
Cache Miss │ Cache Hit
│
┌──────▼───────┐
│ Edge Node │
│ (L1 Cache) │
│ ISP-level │
└──────┬───────┘
│
┌──────▼───────┐
│ Client │
│ │
└──────────────┘
Three-tier cache hierarchy:
L1: Edge (in-ISP) — microseconds latency, small cache, highest hit rate for popular content
L2: Regional PoP — 5-20 ms latency, larger cache, catches most L1 misses
Origin — 50-200 ms latency, authoritative, only for cold content
CDN Routing & Edge Selection
When a client requests a video segment, the CDN must route to the optimal edge server. YouTube uses a combination of:
- DNS-based routing — resolve
rr1.sn-something.googlevideo.comto the nearest edge IP - Anycast — the same IP is announced from multiple PoPs; BGP routing sends the client to the nearest one
- 302 redirects — the API server can redirect to a specific edge based on real-time load and cache status
- Client hints — the player reports its location, ISP, and measured latency to help with server selection
CDN Cost Analysis
Bandwidth egress is YouTube's single largest operational cost. Let's estimate:
| Cost Component | Calculation | Monthly Cost |
|---|---|---|
| Egress bandwidth | 1 billion hrs/day × avg 3 Mbps = ~40 EB/month at ~$0.02/GB | ~$800M |
| Storage (total ~1 EB) | ~1 EB at blended ~$0.005/GB/month | ~$5M |
| Transcoding compute | 720K videos/day × ~$0.50/video avg (multiple renditions) | ~$10.8M |
| CDN infrastructure | 1000s of edge servers, peering agreements | ~$200M |
| Total estimated | ~$1B+/month |
View Counting at Scale
Counting views sounds trivial — just increment a counter, right? At YouTube's scale (5+ billion views/day, potential 10 million views/second for viral videos), naive counting breaks down in multiple ways.
Challenges
- Thundering herd — a viral video could get millions of concurrent increments, creating a single-row hotspot
- Duplicate counting — the same user refreshing the page shouldn't count as multiple views
- Bot detection — bots and view-farming services try to inflate counts
- Consistency — showing slightly stale counts is fine, but the count should never go backwards
Architecture: Eventual Consistency + Batch Aggregation
View Event Flow:
Client → "view" event → Kafka (partitioned by video_id)
│
▼
Stream Processor (Flink/Dataflow)
│
├──▶ Deduplication (HyperLogLog per video per hour)
│ └── If user_id already counted in this window → discard
│
├──▶ Bot Detection (ML model)
│ └── If bot score > 0.9 → discard
│
└──▶ Batch Aggregation (every 5 minutes)
└── Accumulated delta → UPDATE videos SET view_count =
view_count + delta WHERE video_id = ?
Near-real-time counter (for display):
Redis (sharded) — INCRBY video:{id}:views {delta}
└── Periodically flush to MySQL (every 5 min)
└── HyperLogLog for unique viewer estimation:
PFADD video:{id}:viewers:{hour} {user_id}
PFCOUNT video:{id}:viewers:{hour} → ~0.81% error
Redis HyperLogLog for View Deduplication
HyperLogLog is a probabilistic data structure that estimates cardinality (unique count) using only 12 KB of memory regardless of how many elements are added. YouTube uses it to deduplicate views:
# Redis HyperLogLog for view dedup
# Each video has an hourly HLL
# User watches video — add to HLL
PFADD video:abc123:viewers:2026041215 user_98765
# Returns 1 if this is a new unique viewer, 0 if already seen
# Get approximate unique viewer count
PFCOUNT video:abc123:viewers:2026041215
# Returns ~count with ±0.81% error
# Merge hourly HLLs into daily count
PFMERGE video:abc123:viewers:20260412 \
video:abc123:viewers:2026041200 \
video:abc123:viewers:2026041201 \
... \
video:abc123:viewers:2026041223
# Memory: 12 KB per HLL × 24 hours × 800M videos
# = ~230 TB (only for "active" videos — most are inactive)
# In practice: only maintain HLLs for videos with recent views
# Expiry: TTL 48 hours, then merge into permanent count
Comment Service
YouTube comments are a separate microservice with its own database, sharded by video_id. This isolation ensures that comment-related issues (spam waves, heavy moderation) don't affect video streaming.
Comment Schema & Storage
-- Comments table (sharded by video_id)
CREATE TABLE comments (
comment_id BIGINT PRIMARY KEY, -- snowflake ID
video_id BIGINT NOT NULL,
parent_id BIGINT DEFAULT NULL, -- NULL for top-level,
-- comment_id for replies
user_id BIGINT NOT NULL,
content TEXT NOT NULL,
created_at TIMESTAMP NOT NULL,
updated_at TIMESTAMP,
like_count INT DEFAULT 0,
is_pinned BOOLEAN DEFAULT FALSE,
is_hearted BOOLEAN DEFAULT FALSE, -- creator "heart"
status ENUM('active','removed','spam','held'),
INDEX idx_video_time (video_id, created_at DESC),
INDEX idx_video_top (video_id, like_count DESC),
INDEX idx_parent (parent_id)
) ENGINE=InnoDB;
-- Sorting strategies per video:
-- "Top comments" → ORDER BY like_count DESC, created_at DESC
-- "Newest first" → ORDER BY created_at DESC
-- Pagination via cursor: WHERE video_id = ? AND created_at < ?
-- LIMIT 20
Comment Moderation Pipeline
New Comment
│
├──▶ Spam Detection (ML model, < 10ms)
│ ├── Score > 0.95 → auto-remove (mark as spam)
│ ├── Score 0.5–0.95 → hold for review
│ └── Score < 0.5 → publish immediately
│
├──▶ Toxicity Detection (Perspective API)
│ └── Score > 0.8 → hold for review
│
├──▶ Regex Filters (channel-specific blocked words)
│
└──▶ Rate Limiting (max 10 comments/minute per user)
Published comments are indexed in Elasticsearch for
search within the video's comment section.
Live Streaming Architecture
Live streaming adds real-time constraints on top of the VOD architecture. The key difference: transcoding must happen in real time, and latency from broadcaster to viewer should be under 5 seconds.
Broadcaster (OBS/phone)
│
│ RTMP/SRT ingest (1-2 Mbps uplink)
▼
┌────────────────┐
│ Ingest Server │ → receive raw stream, buffer 1-2 GOP
│ (Region-local)│
└───────┬────────┘
│
▼
┌────────────────┐
│ Live Transcoder│ → real-time: 240p, 480p, 720p, 1080p
│ (GPU-accelerated)│ using NVENC / hardware encoders
│ │ latency budget: < 500ms per segment
└───────┬────────┘
│
├──▶ HLS/DASH segments (2s segments for low latency)
│ pushed to CDN edge every 2 seconds
│
├──▶ LL-HLS (Low-Latency HLS) for < 3s glass-to-glass
│
└──▶ DVR buffer (last 4 hours) for rewind/seek
Viewers request the "live edge" of the playlist:
#EXT-X-SERVER-CONTROL:CAN-BLOCK-RELOAD=YES,PART-HOLD-BACK=1.0
The server holds the response until a new segment is ready,
eliminating polling delay (HTTP long-polling for live edge).
Infrastructure Cost Deep Dive
Building a YouTube-scale system forces hard economic trade-offs. Let's analyze the three dominant cost centers:
1. Storage Costs (Petabyte Scale)
Daily upload volume:
500 hrs/min × 60 min/hr × 24 hrs = 720,000 hrs of video/day
Per video (average 5-min, 1080p original):
Original: ~500 MB
Transcoded (7 resolutions × 2 codecs avg): ~2.5 GB total
Thumbnails + captions: ~5 MB
Total per video: ~3 GB
Daily new storage: 720K videos × 3 GB = ~2.16 PB/day
Annual new storage: ~790 PB/year
Cumulative (15+ years): ~5-10 EB (Exabytes)
Annual storage cost (blended tiers):
Hot tier (5%): 50 PB × $0.023/GB × 12 = $13.8M/year
Warm tier (15%): 150 PB × $0.010/GB × 12 = $18.0M/year
Cold tier (60%): 600 PB × $0.004/GB × 12 = $28.8M/year
Archive (20%): 200 PB × $0.001/GB × 12 = $2.4M/year
─────────────────────────────────────────
Total storage: ~$63M/year
2. Bandwidth Costs (Egress Is King)
Daily video views: ~5 billion
Average video duration watched: ~7 minutes
Average bitrate: ~3 Mbps (blended across quality levels)
Daily egress:
5B views × 7 min × 60 sec × 3 Mbps / 8 = ~787 PB/day
Monthly egress: ~23.6 EB/month
Cost at cloud provider rates ($0.05-0.08/GB):
23.6 EB × $0.05/GB = ~$1.18B/month ← ASTRONOMICAL
Google's advantage: own fiber network + ISP-level caches
Effective cost: ~$0.005-0.01/GB (peering + owned infra)
23.6 EB × $0.008/GB = ~$189M/month ← still huge but 6x cheaper
3. Compute Costs (Transcoding)
Videos transcoded per day: ~720,000
Average transcoding time per rendition: ~2× real-time
(5-min video takes ~10 min to transcode per rendition)
Renditions per video: ~12 (7 resolutions × ~2 codecs)
Total transcode-minutes per day:
720K × 12 renditions × 10 min = 86.4M CPU-minutes/day
At AWS c5.2xlarge ($0.34/hr, 8 vCPUs):
86.4M min ÷ 8 cores ÷ 60 = ~180K instance-hours/day
180K × $0.34 = ~$61K/day = ~$1.83M/month
With spot/preemptible instances (~70% discount):
~$550K/month for transcoding compute
Note: AV1 encoding is 10-50× slower than H.264,
so it's only used for the most popular videos where
the bandwidth savings justify the compute cost.
Fault Tolerance & Reliability
At YouTube's scale, failures are not exceptions — they are the norm. The system must gracefully handle:
- Transcoding worker crashes — each segment job is idempotent; the queue redelivers failed segments to another worker. A video is only marked "complete" when ALL segments are done (tracked in a completion counter in Redis)
- CDN edge failure — clients automatically fall back to the next-nearest edge. The player detects segment download failures and retries with a different CDN hostname
- Database shard failure — Vitess maintains synchronous replicas per shard. Automated failover promotes a replica to primary within seconds
- Blob storage corruption — GCS/S3 use erasure coding (e.g., Reed-Solomon) and cross-region replication. 11 nines durability means less than 1 object lost per 10 billion per year
- Regional outage — traffic is rerouted to the next-nearest region via global load balancing. CDN edges in unaffected regions continue serving cached content
Circuit Breaker Pattern
# Circuit breaker for transcoding service calls
class CircuitBreaker:
CLOSED = 'closed' # normal operation
OPEN = 'open' # failing, reject requests
HALF_OPEN = 'half_open' # testing if service recovered
def __init__(self, failure_threshold=5, timeout_sec=30):
self.state = self.CLOSED
self.failure_count = 0
self.threshold = failure_threshold
self.timeout = timeout_sec
self.last_failure_time = None
def call(self, func, *args, **kwargs):
if self.state == self.OPEN:
if time.time() - self.last_failure_time > self.timeout:
self.state = self.HALF_OPEN
else:
raise ServiceUnavailableError("Circuit breaker OPEN")
try:
result = func(*args, **kwargs)
if self.state == self.HALF_OPEN:
self.state = self.CLOSED
self.failure_count = 0
return result
except Exception as e:
self.failure_count += 1
self.last_failure_time = time.time()
if self.failure_count >= self.threshold:
self.state = self.OPEN
raise
Summary & Interview Tips
| Component | Technology | Key Design Decision |
|---|---|---|
| Upload | Resumable upload → Kafka → worker fleet | Async processing, segment-level parallelism |
| Transcoding | FFmpeg, H.264/VP9/AV1, DAG scheduler | Pre-transcode all formats; AV1 only for popular |
| Streaming | HLS/DASH, adaptive bitrate | Buffer-based ABR, 4-sec segments |
| Storage | GCS/S3, tiered (hot/warm/cold/archive) | Long-tail → cold; popular → multi-region hot |
| Metadata DB | MySQL + Vitess (sharded) | Shard by video_id; secondary indexes for lookups |
| Search | Elasticsearch + ML re-ranker | BM25 retrieval → neural ranking |
| Recommendations | ALS + deep ranking model | Optimize for watch time, not clicks |
| CDN | Google Global Cache (ISP-level) | 3-tier cache hierarchy, own infra to save costs |
| View counting | Kafka → Flink → Redis HLL → MySQL | Eventual consistency, batch aggregation |
| Comments | Separate microservice, sharded by video_id | Isolated blast radius, ML moderation |