← All Posts
High Level Design Series· Real-World Designs · Post 58

Design: Twitter

Requirements & Scale

Twitter (now X) is one of the most demanding real-time platforms on the internet. It combines the write-heavy nature of a messaging system with the read-heavy nature of a news feed, all under strict latency constraints. Before we design anything, let’s crystallize the requirements.

Functional Requirements

Non-Functional Requirements

Scale Estimation

The numbers that drive every design decision:

MetricValueImplications
Daily Active Users (DAU)400MMassive read load on timelines
New tweets per day500M~5,800 tweets/sec average, 15K+ peak
Average follows per user200Each tweet fans out to ~200 timelines
Timeline reads per day100B~1.2M reads/sec — read:write ratio of 200:1
Average tweet size~300 bytes (text + metadata)~150 GB/day of tweet text alone
Media attachments~30% of tweets have media~15 TB/day of images and video
Search queries per day1B~12K queries/sec, real-time indexing needed
Celebrity accounts (>1M followers)~50KFan-out on write is infeasible for these
Key insight: The 200:1 read-to-write ratio means we should optimize aggressively for reads. Pre-computing timelines (fan-out on write) trades write amplification for fast reads — but only for normal users. Celebrity accounts break this model entirely.

High-Level Architecture

The system decomposes into several core services, each scaling independently:

                           ┌─────────────────────┐
                           │    API Gateway /     │
  Mobile / Web Clients ──▶ │    Load Balancer     │
                           └──────┬──────────────┘
                                  │
           ┌──────────────────────┼──────────────────────┐
           ▼                      ▼                      ▼
   ┌──────────────┐    ┌──────────────────┐    ┌──────────────┐
   │ Tweet Service │    │ Timeline Service │    │ User Service  │
   │ (write path)  │    │  (read path)     │    │ (profiles,    │
   │               │    │                  │    │  auth, follow) │
   └──────┬───────┘    └────────┬─────────┘    └──────┬───────┘
          │                     │                      │
          ▼                     ▼                      ▼
   ┌──────────────┐    ┌──────────────────┐    ┌──────────────┐
   │  Fan-out     │    │  Timeline Cache  │    │ Social Graph  │
   │  Service     │    │  (Redis Cluster) │    │ (Graph DB /   │
   └──────┬───────┘    └──────────────────┘    │  Redis)       │
          │                                     └──────────────┘
          ▼
   ┌──────────────┐    ┌──────────────────┐    ┌──────────────┐
   │  Tweet Store  │    │ Search Service   │    │ Media Service │
   │  (MySQL/      │    │ (Elasticsearch)  │    │ (S3 + CDN)   │
   │   Cassandra)  │    └──────────────────┘    └──────────────┘
   └──────────────┘
          │            ┌──────────────────┐    ┌──────────────┐
          └──────────▶ │ Trending Service │    │ Notification  │
                       │ (Storm/Flink)    │    │ Service       │
                       └──────────────────┘    └──────────────┘

Each service is independently deployable, horizontally scalable, and communicates via a combination of synchronous RPCs (for user-facing paths) and asynchronous message queues (for fan-out, search indexing, and analytics).

Tweet Service

The tweet service handles the critical write path — accepting a new tweet, persisting it, and kicking off the downstream pipeline (fan-out, search indexing, trending, notifications).

Tweet Creation Flow

  1. Client sends POST /api/tweets with text (max 280 chars), optional media IDs (pre-uploaded), and metadata (reply_to, quote_tweet_id).
  2. API Gateway authenticates the request (OAuth 2.0 bearer token), checks rate limits (300 tweets/3 hours per user), and routes to the Tweet Service.
  3. Tweet Service validates the payload:
    • Text length ≤ 280 Unicode characters (not bytes — CJK characters count as one).
    • Media IDs reference valid, uploaded media objects.
    • Spam/abuse check via ML classifier (synchronous, <50ms).
  4. Generate Snowflake ID (more on this below) — this becomes the tweet’s globally unique, time-sortable identifier.
  5. Write to Tweet Store — synchronous write to the primary datastore.
  6. Publish to message queue (Kafka) — a single event triggers multiple downstream consumers:
    • Fan-out Service (timeline delivery)
    • Search Indexing Pipeline
    • Trending Topics Pipeline
    • Notification Service
    • Analytics Pipeline
  7. Return 201 Created with the tweet object to the client.

Tweet Data Model

The tweets table is the heart of the system. It needs to support two access patterns: lookup by tweet_id (O(1)) and range scans by user_id + time (user timeline).

ColumnTypeNotes
tweet_idBIGINT (Snowflake ID)Primary key, time-sortable
user_idBIGINTAuthor, indexed for user timeline
textVARCHAR(1120)280 chars × 4 bytes (UTF-8 max)
media_idsJSON / ARRAYReferences to media objects
reply_to_idBIGINT, nullableParent tweet for threading
retweet_of_idBIGINT, nullableOriginal tweet for retweets
quote_tweet_idBIGINT, nullableQuoted tweet reference
like_countINTDenormalized counter
retweet_countINTDenormalized counter
reply_countINTDenormalized counter
hashtagsJSON / ARRAYExtracted hashtags for trending
languageCHAR(2)Detected language code
geoPOINT, nullableOptional geolocation
created_atTIMESTAMPRedundant with Snowflake ID, but useful

Partitioning Strategy

At 500M tweets/day, a single database is impossible. The tweet store is partitioned by tweet_id (specifically, by the Snowflake ID):

-- Secondary index table (partitioned by user_id)
CREATE TABLE user_tweets (
    user_id    BIGINT,
    tweet_id   BIGINT,     -- Snowflake ID (also encodes timestamp)
    PRIMARY KEY (user_id, tweet_id)
) WITH CLUSTERING ORDER BY (tweet_id DESC);

-- Query user timeline: single partition read
SELECT tweet_id FROM user_tweets
WHERE user_id = 12345
LIMIT 20;

Snowflake ID Generation

Twitter invented the Snowflake ID scheme to generate globally unique, roughly time-ordered 64-bit IDs without coordination between servers. This is critical because:

Bit Layout

┌─────────────────────────────────────────────────────────────────┐
│                        64-bit Snowflake ID                       │
├──────┬────────────────────────┬──────────┬──────────┬───────────┤
│ Sign │  Timestamp (ms)        │ Datacenter│ Worker  │ Sequence  │
│ 1 bit│  41 bits               │ 5 bits    │ 5 bits  │ 12 bits   │
├──────┼────────────────────────┼──────────┼──────────┼───────────┤
│  0   │ ms since custom epoch  │  0-31     │  0-31   │  0-4095   │
│      │ (2010-11-04 01:42:54)  │           │         │           │
└──────┴────────────────────────┴──────────┴──────────┴───────────┘

Key properties of this layout:

ComponentBitsRangeImplication
Sign1Always 0Keeps IDs positive in signed 64-bit integers
Timestamp41~69.7 yearsIDs are time-sortable; epoch runs until ~2079
Datacenter ID50–31Supports 32 datacenters
Worker ID50–3132 workers per datacenter = 1,024 total workers
Sequence120–40954,096 IDs per millisecond per worker

Total throughput: 1,024 workers × 4,096 IDs/ms = 4,194,304 IDs per millisecond — far exceeding Twitter’s peak of ~15K tweets/sec. The sequence counter resets every millisecond; if a worker exhausts 4,096 sequences within 1ms, it busy-waits until the next millisecond.

Implementation Sketch

class SnowflakeGenerator:
    EPOCH = 1288834974657  # Custom epoch: 2010-11-04T01:42:54.657Z

    def __init__(self, datacenter_id, worker_id):
        assert 0 <= datacenter_id < 32
        assert 0 <= worker_id < 32
        self.datacenter_id = datacenter_id
        self.worker_id = worker_id
        self.sequence = 0
        self.last_timestamp = -1

    def next_id(self):
        timestamp = current_time_ms()

        if timestamp == self.last_timestamp:
            self.sequence = (self.sequence + 1) & 0xFFF  # 12-bit mask
            if self.sequence == 0:
                timestamp = self._wait_next_ms(timestamp)
        else:
            self.sequence = 0

        self.last_timestamp = timestamp

        return ((timestamp - self.EPOCH) << 22 |
                (self.datacenter_id << 17) |
                (self.worker_id << 12) |
                self.sequence)

    def _wait_next_ms(self, last_ts):
        ts = current_time_ms()
        while ts <= last_ts:
            ts = current_time_ms()
        return ts
Clock skew problem: If a worker’s clock drifts backward (e.g., NTP adjustment), it could generate duplicate IDs or break ordering. Snowflake handles this by refusing to generate IDs when the clock goes backward — the worker throws an exception and the request is retried on another worker. In practice, operators ensure NTP is configured with tinker panic 0 to prevent large jumps.

Timeline Service — Hybrid Fan-Out

The home timeline is the single most important and most complex feature. Every time a user opens Twitter, they expect to see a fresh, personalized feed of tweets from people they follow — in under 200ms. This is where the hybrid fan-out approach comes in.

Three Approaches to Timeline Generation

Approach 1: Fan-Out on Read (Pull Model)

GET /home_timeline?user_id=123

1. Look up user 123's following list: [user_A, user_B, user_C, ...]
2. For each followed user, query their recent tweets
3. Merge and sort all tweets by timestamp
4. Return top N tweets

-- SQL equivalent (conceptual):
SELECT * FROM tweets
WHERE user_id IN (SELECT followee_id FROM follows WHERE follower_id = 123)
ORDER BY tweet_id DESC
LIMIT 20;

Problem: If a user follows 500 people, this is 500 database queries per timeline load. At 1.2M timeline reads/sec, that’s 600M database queries/sec — completely infeasible. Latency would be seconds, not milliseconds.

Approach 2: Fan-Out on Write (Push Model)

User A posts a tweet:

1. Look up A's followers: [user_123, user_456, user_789, ...]
2. For EACH follower, append tweet_id to their timeline cache
3. When user_123 requests home timeline, just read from their pre-built cache

-- Redis operation:
LPUSH timeline:123 <tweet_id>
LPUSH timeline:456 <tweet_id>
LPUSH timeline:789 <tweet_id>
LTRIM timeline:123 0 799   -- Keep only last 800 tweets

Problem: When a celebrity with 50M followers tweets, the fan-out service must write to 50 million timeline caches. At 300 bytes per entry, that’s 15GB of cache writes for a single tweet. With celebrities tweeting 5–20 times per day, the write amplification is catastrophic.

Approach 3: Hybrid (Twitter’s Actual Approach)

Twitter uses a hybrid model that combines the best of both approaches:

Why 10K as the threshold? It’s a practical heuristic. A user with 10K followers means 10K cache writes per tweet — that completes in ~20ms on a Redis cluster. At 1M followers, the same operation takes 2+ seconds and consumes significant network/memory resources. The threshold is tunable and varies by system load.

Timeline Cache Design in Redis

The timeline cache is a Redis Cluster storing pre-computed timelines. This is the most latency-sensitive component in the entire system.

Data Structure

Key:    timeline:{user_id}
Type:   Sorted Set (ZSET)
Score:  Snowflake ID (which is time-sortable)
Member: tweet_id (as string)

-- Fan-out write: celebrity tweets handled separately
ZADD timeline:123 1780012345678901234 "1780012345678901234"

-- Read home timeline (newest 20 tweets)
ZREVRANGEBYSCORE timeline:123 +inf -inf LIMIT 0 20

-- Trim to keep cache bounded (800 entries max)
ZREMRANGEBYRANK timeline:123 0 -801

Key design decisions for the Redis timeline cache:

DecisionChoiceRationale
Data structureSorted Set (ZSET)O(log N) insert and range queries by score. Score = Snowflake ID gives time ordering for free.
What we storeOnly tweet_ids, not full tweetsKeeps cache compact. Full tweet data fetched from Tweet Cache on read (batch MGET). A timeline of 800 tweet_ids is ~6.4KB vs ~240KB for full tweet objects.
Cache size per user800 tweet_idsCovers ~2–3 days of content for an average user following 200 accounts. Beyond this, we fall back to database queries.
EvictionManual trim after each insertZREMRANGEBYRANK keeps exactly 800 entries. No reliance on Redis eviction policies.
Cluster topologyRedis Cluster, 1000+ nodes400M users × 6.4KB = ~2.5TB minimum. With replication factor 3 and headroom: ~10TB cluster.
PersistenceNone (pure cache)If a node dies, timelines are rebuilt from the Tweet Store. Cold-start penalty is acceptable for rare events.

Timeline Read Path (Merged)

def get_home_timeline(user_id, count=20):
    # Step 1: Get pre-built timeline from cache (normal user tweets)
    cached_tweet_ids = redis.zrevrangebyscore(
        f"timeline:{user_id}", "+inf", "-inf",
        start=0, num=count * 2  # Over-fetch for merge headroom
    )

    # Step 2: Get followed celebrities
    celebrity_followees = get_celebrity_followees(user_id)

    # Step 3: Pull recent tweets from each celebrity (fan-out on read)
    celebrity_tweets = []
    for celeb_id in celebrity_followees:
        tweets = redis.zrevrangebyscore(
            f"user_tweets:{celeb_id}", "+inf", "-inf",
            start=0, num=count
        )
        celebrity_tweets.extend(tweets)

    # Step 4: Merge, sort by Snowflake ID (= chronological), take top N
    all_tweet_ids = cached_tweet_ids + celebrity_tweets
    all_tweet_ids.sort(reverse=True)  # Descending = newest first
    top_ids = all_tweet_ids[:count]

    # Step 5: Hydrate tweet objects (batch fetch from Tweet Cache)
    tweets = tweet_cache.mget(top_ids)  # Redis MGET, <1ms

    # Step 6: Hydrate user profiles (batch fetch)
    user_ids = {t.user_id for t in tweets}
    users = user_cache.mget(user_ids)

    return enrich(tweets, users)

The average user follows 3–5 celebrities. So step 3 adds only 3–5 Redis lookups to the timeline read — negligible latency overhead.

Social Graph Service

The social graph manages the follow/unfollow relationships between users. It answers three critical questions:

  1. Who does user X follow? (Following list — needed for fan-out on read of celebrity tweets)
  2. Who follows user X? (Follower list — needed for fan-out on write)
  3. Does user X follow user Y? (Follow check — needed for UI rendering and access control)

Data Model Options

Option A: Adjacency List in Redis

-- Who does user 123 follow?
Key:   following:123     Type: SET
Members: {456, 789, 101, ...}

-- Who follows user 456?
Key:   followers:456     Type: SET
Members: {123, 555, 888, ...}

-- Follow user
SADD following:123 456
SADD followers:456 123

-- Unfollow
SREM following:123 456
SREM followers:456 123

-- Check if following
SISMEMBER following:123 456  → 1 (yes) or 0 (no)

-- Follower count
SCARD followers:456 → 1,340,201

Pros: O(1) follow check, O(1) count, blazing fast. Cons: Memory-intensive. A user with 50M followers has a 50M-member set (~400MB). For celebrities, we store only the count in Redis and page through followers from the database for fan-out operations.

Option B: Graph Database (e.g., FlockDB, TAO)

Twitter originally built FlockDB, a distributed graph database optimized for adjacency-list queries. Facebook uses TAO for the same purpose. Key characteristics:

Recommended: Hybrid

Use Redis for hot-path queries (follow check, counts, following list for timeline merge) and a persistent store (MySQL or FlockDB-style) as the source of truth. Redis is populated on follow/unfollow and warmed on user login.

Follow/Unfollow Flow

POST /api/users/123/follow  body: { target_user_id: 456 }

1. Validate: user 123 exists, user 456 exists, not already following
2. Write to persistent store:
   INSERT INTO follows (follower_id, followee_id, created_at) VALUES (123, 456, NOW())
3. Update Redis (async via message queue for consistency):
   SADD following:123 456
   SADD followers:456 123
   INCR follower_count:456
   INCR following_count:123
4. If user 456 is a normal user (<10K followers):
   → Backfill: push 456's recent tweets into 123's timeline cache
5. If user 456 is a celebrity:
   → Add 456 to 123's celebrity-followee set (for fan-out on read)
6. Send notification to user 456: "user 123 started following you"

Search Service

Twitter search must index 500M tweets per day and make them searchable within seconds of posting. This is one of the most demanding real-time search systems in the world.

Architecture: Earlybird + Elasticsearch

Twitter uses a search architecture called Earlybird, which is a custom real-time inverted index optimized for recency. Let’s build something similar:

Two-Tier Search

┌──────────────────────────────────────────────────┐
│               Search Query Flow                   │
│                                                    │
│  Query ──▶ Blender (merges results from both)     │
│              │                    │                 │
│              ▼                    ▼                 │
│    ┌─────────────────┐  ┌─────────────────────┐   │
│    │  Earlybird       │  │  Full Index          │   │
│    │  (Real-Time)     │  │  (Elasticsearch)     │   │
│    │                   │  │                       │   │
│    │  Last ~10 sec of  │  │  Historical tweets    │   │
│    │  tweets, in-memory│  │  (days/weeks/months)  │   │
│    │  inverted index   │  │  sharded by time      │   │
│    └─────────────────┘  └─────────────────────┘   │
└──────────────────────────────────────────────────┘

Earlybird: Real-Time Inverted Index

Earlybird is an in-memory, append-only inverted index that processes the most recent tweets (last ~10 seconds). It’s designed for the common case: users searching for breaking news and trending topics want the absolute freshest results.

class EarlybirdIndex:
    def __init__(self):
        self.inverted_index = {}   # token → sorted list of tweet_ids
        self.tweet_store = {}      # tweet_id → tweet metadata

    def index_tweet(self, tweet):
        tokens = tokenize(tweet.text)  # Lowercase, remove stopwords, stem
        for token in tokens:
            if token not in self.inverted_index:
                self.inverted_index[token] = []
            self.inverted_index[token].append(tweet.tweet_id)
        self.tweet_store[tweet.tweet_id] = tweet

    def search(self, query, limit=20):
        tokens = tokenize(query)
        if len(tokens) == 1:
            posting_list = self.inverted_index.get(tokens[0], [])
        else:
            # Intersect posting lists for multi-term queries
            sets = [set(self.inverted_index.get(t, [])) for t in tokens]
            posting_list = sorted(set.intersection(*sets), reverse=True)
        return posting_list[:limit]

    def gc(self, max_age_seconds=10):
        """Evict tweets older than max_age_seconds to keep memory bounded."""
        cutoff_id = snowflake_id_from_timestamp(now() - max_age_seconds)
        # Remove from inverted index and tweet store
        ...

Earlybird runs on multiple shards (partitioned by hash of tweet_id) with each shard handling ~500 tweets/sec of indexing throughput. A query fans out to all Earlybird shards in parallel, and results are merged by the Blender service.

Full Index: Elasticsearch Cluster

Tweets older than ~10 seconds flow into Elasticsearch for long-term searchability:

Indexing Pipeline

Tweet posted
    → Kafka topic: "new-tweets"
        → Earlybird Consumer: index in-memory (within 1 second)
        → Elasticsearch Consumer: bulk index (within 5-10 seconds)
            - Tokenize text (custom analyzer: @mentions, #hashtags, URLs)
            - Extract entities (people, places, events)
            - Detect language
            - Compute engagement signals (initial = 0, updated periodically)
            - Write to daily ES index with tweet_id as document ID

Trending topics surface what the world is talking about right now. The algorithm must be real-time (detect trends within minutes), noise-resistant (filter spam and evergreen topics), and geographically aware.

The core algorithm uses a sliding window approach over hashtag frequencies, detecting acceleration rather than absolute volume:

For each hashtag H in the last 5 minutes:
    current_count   = count(H, window=last_5_min)
    baseline_count  = count(H, window=last_24_hours) / 288  # Normalized to 5-min
    trend_score     = (current_count - baseline_count) / max(baseline_count, 1)

Trending if:
    trend_score > THRESHOLD (e.g., 3.0)  AND
    current_count > MIN_VOLUME (e.g., 500)  AND
    NOT in spam_blocklist  AND
    NOT evergreen (e.g., "love", "good morning")

This means a hashtag trending isn’t about being the most used — it’s about having the highest acceleration. #Love might appear in 100K tweets/5min, but if that’s its normal rate, it doesn’t trend. #SuperBowl going from 100 to 50K in 5 minutes is a trend.

┌──────────┐     ┌──────────────┐     ┌──────────────────┐     ┌────────────┐
│ Kafka    │ ──▶ │ Flink/Storm  │ ──▶ │ Trend Detector   │ ──▶ │ Trending   │
│ new-     │     │ Hashtag      │     │ (sliding window  │     │ Cache      │
│ tweets   │     │ Extractor    │     │  + scoring)      │     │ (Redis)    │
└──────────┘     └──────────────┘     └──────────────────┘     └────────────┘
                                             │
                                             ▼
                                      ┌──────────────┐
                                      │ Spam Filter   │
                                      │ (ML model +   │
                                      │  blocklist)   │
                                      └──────────────┘

Multi-Level Trending

Without filtering, trending topics would be dominated by spam bots and coordinated manipulation campaigns. Multiple layers of defense:

Media Service

~30% of tweets contain images or video. The media pipeline must handle high-throughput uploads, transcoding, and low-latency delivery.

Upload Flow

1. Client requests upload URL:
   POST /api/media/upload/init → returns { media_id, upload_url (S3 presigned) }

2. Client uploads directly to S3 (bypasses our servers entirely):
   PUT <upload_url>  body: <binary image/video data>

3. S3 triggers Lambda/worker for post-processing:
   - Images: resize to standard dimensions (150x150 thumb, 680px medium, original)
   - Video: transcode to multiple bitrates (360p, 720p, 1080p) via FFmpeg
   - Generate blurhash placeholder for progressive loading
   - Store all variants back in S3

4. Media service marks media_id as "ready"

5. Client includes media_id in tweet creation:
   POST /api/tweets  body: { text: "...", media_ids: ["abc123"] }

Delivery via CDN

Notification Service

Notifications keep users engaged by alerting them to interactions: mentions, retweets, likes, replies, and new followers.

Notification Types & Priorities

TypeTriggerPriorityDelivery
Mention (@user)Tweet text parsingHighPush + In-app
ReplyTweet with reply_to_idHighPush + In-app
RetweetRetweet actionMediumPush + In-app
LikeLike actionLowIn-app only (batched)
New followerFollow actionLowIn-app (batched)
Trending topicPersonalized trend matchLowPush (opt-in)

Architecture

Event Sources (Kafka topics)
    │
    ▼
Notification Router
    ├── Priority queue (high/medium/low)
    ├── Deduplication (don't notify same event twice)
    ├── User preference check (muted? notification settings?)
    └── Rate limiting (max 50 push notifications/hour per user)
         │
         ├──▶ Push Notification Service (APNs / FCM)
         ├──▶ In-App Notification Store (Cassandra)
         └──▶ Email Digest Service (batched, daily/weekly)

Celebrity notification problem: When a tweet by @KatyPerry (100M+ followers) gets liked, we don’t send 100M notifications. Instead, likes are aggregated: “@user1, @user2, and 47,382 others liked your tweet.” The aggregation happens in the notification store, not in delivery.

Rate Limiting

Rate limiting protects the system from abuse, runaway clients, and denial-of-service attacks. Twitter applies limits at multiple layers:

Rate Limit Tiers

EndpointLimitWindowRationale
POST /tweets300 tweets3 hoursPrevents spam bots from flooding
POST /likes1,000 likes24 hoursPrevents artificial engagement inflation
POST /follows400 follows24 hoursPrevents mass follow-unfollow schemes
GET /home_timeline150 requests15 minutesPrevents scraping, protects Redis cache
GET /search180 requests15 minutesProtects Elasticsearch cluster
App-level (OAuth)300 requests15 minutesPer-app rate limit across all users

Implementation

Rate limiting is implemented at the API Gateway using a sliding window counter backed by Redis:

-- Sliding window rate limit check (Lua script for atomicity)
local key = KEYS[1]                    -- e.g., "ratelimit:tweets:user:123"
local window = tonumber(ARGV[1])       -- e.g., 10800 (3 hours in seconds)
local max_requests = tonumber(ARGV[2]) -- e.g., 300
local now = tonumber(ARGV[3])          -- current timestamp

-- Remove entries outside the window
redis.call('ZREMRANGEBYSCORE', key, 0, now - window)

-- Count requests in current window
local count = redis.call('ZCARD', key)

if count < max_requests then
    -- Allow: add this request
    redis.call('ZADD', key, now, now .. ':' .. math.random(1000000))
    redis.call('EXPIRE', key, window)
    return {1, max_requests - count - 1}  -- allowed, remaining
else
    return {0, 0}  -- denied, 0 remaining
end

Rate limit responses include standard headers:

HTTP/1.1 429 Too Many Requests
X-Rate-Limit-Limit: 300
X-Rate-Limit-Remaining: 0
X-Rate-Limit-Reset: 1713456789
Retry-After: 1823

Celebrity Tweet Handling

Celebrities are the hardest scaling problem on Twitter. An account with 50M followers breaks the fan-out model. Here’s how the system handles it end-to-end:

Dynamic Celebrity Classification

User Classification Logic:
    if follower_count < 10,000:
        user_tier = "normal"      → Fan-out on write
    elif follower_count < 1,000,000:
        user_tier = "popular"     → Fan-out on write with async batching
    else:
        user_tier = "celebrity"   → Fan-out on read only

-- Classification is updated hourly via a background job
-- Tier changes trigger timeline cache migration:
--   normal → celebrity: stop pushing, add to celebrity index
--   celebrity → normal: start pushing, backfill timelines

Celebrity Tweet Cache

Celebrity tweets are stored in a dedicated cache for fast retrieval during timeline merge:

Key:    celebrity_tweets:{user_id}
Type:   Sorted Set (ZSET)
Score:  Snowflake ID
Member: tweet_id

-- Elon Musk tweets → stored here, NOT pushed to 150M timeline caches
ZADD celebrity_tweets:44196397 1780099887766554433 "1780099887766554433"

-- When follower 123 loads timeline:
-- 1. Read their pre-built cache (normal tweets)
-- 2. For each celebrity they follow, ZREVRANGE celebrity_tweets:{celeb_id}
-- 3. Merge all results by Snowflake ID

This trades a small read-time cost (~3–5 extra Redis calls per timeline read) for eliminating millions of cache writes per celebrity tweet.

Engagement Counts at Scale

A viral celebrity tweet can accumulate millions of likes/retweets per minute. Updating the like_count column directly would create a hot row. Instead:

User Timeline

Unlike the home timeline, the user timeline is straightforward: it shows all tweets by a single user, in reverse chronological order. This is a simpler problem:

GET /api/users/456/tweets?cursor=<last_tweet_id>&count=20

Option A: Query the secondary index table
    SELECT tweet_id FROM user_tweets
    WHERE user_id = 456
    AND tweet_id < <cursor>
    ORDER BY tweet_id DESC
    LIMIT 20;

Option B: Redis sorted set (for hot profiles)
    ZREVRANGEBYSCORE user_tweets:456 (<cursor> -inf LIMIT 0 20

For frequently accessed profiles (celebrities, trending accounts), the user timeline is cached in a Redis sorted set identical to the celebrity tweet cache. For long-tail profiles, the database secondary index is sufficient.

Infrastructure & Data Flow Summary

Kafka Topic Architecture

Kafka is the nervous system that connects all services. Every significant event flows through Kafka for decoupled, asynchronous processing:

Topic: new-tweets         (partitioned by user_id, 256 partitions)
  Consumers: fan-out-service, search-indexer, trending-pipeline,
             notification-router, analytics-pipeline

Topic: engagements        (partitioned by tweet_id, 128 partitions)
  Consumers: counter-service, notification-router, ranking-pipeline

Topic: social-graph       (partitioned by user_id, 64 partitions)
  Consumers: timeline-backfill, notification-router, recommendation-engine

Topic: user-activity      (partitioned by user_id, 128 partitions)
  Consumers: analytics-pipeline, ad-targeting, personalization

Complete Write Path: Posting a Tweet

User taps "Tweet" on mobile app
    │
    ▼
API Gateway (rate limit check, auth)
    │
    ▼
Tweet Service
    ├── Generate Snowflake ID
    ├── Validate text (280 char limit, spam check)
    ├── Write to Tweet Store (MySQL/Cassandra)
    ├── Write to user_tweets secondary index
    ├── Update user's tweet_count
    └── Publish to Kafka "new-tweets" topic
         │
         ├──▶ Fan-out Service
         │     ├── Check user tier (normal/popular/celebrity)
         │     ├── If normal: get follower list, ZADD to each timeline cache
         │     ├── If celebrity: ZADD to celebrity_tweets:{user_id} only
         │     └── Async, batched, with backpressure
         │
         ├──▶ Search Indexer
         │     ├── Earlybird: index in-memory (real-time, <1 sec)
         │     └── Elasticsearch: bulk index (near-real-time, <10 sec)
         │
         ├──▶ Trending Pipeline (Flink)
         │     ├── Extract hashtags
         │     ├── Update sliding window counters
         │     └── Recompute trend scores every 30 seconds
         │
         ├──▶ Notification Router
         │     ├── Parse @mentions → notify mentioned users
         │     ├── If reply → notify parent tweet author
         │     └── Push via APNs/FCM
         │
         └──▶ Analytics Pipeline
               ├── Tweet volume metrics
               ├── User engagement tracking
               └── Ad impression attribution

Complete Read Path: Loading Home Timeline

User opens Twitter app
    │
    ▼
API Gateway → Timeline Service
    │
    ├── Step 1: Read pre-built timeline from Redis
    │   ZREVRANGEBYSCORE timeline:123 +inf -inf LIMIT 0 40
    │   Result: [tweet_id_1, tweet_id_2, ..., tweet_id_40]
    │
    ├── Step 2: Get celebrity followees
    │   SMEMBERS celebrity_followees:123
    │   Result: [celeb_A, celeb_B, celeb_C]
    │
    ├── Step 3: Pull celebrity tweets
    │   For each celeb: ZREVRANGEBYSCORE celebrity_tweets:{celeb} +inf -inf LIMIT 0 20
    │   Result: [celeb_tweet_1, celeb_tweet_2, ...]
    │
    ├── Step 4: Merge & sort all tweet_ids by Snowflake ID (descending)
    │   Take top 20
    │
    ├── Step 5: Hydrate tweets (batch fetch)
    │   MGET tweet:id1 tweet:id2 ... tweet:id20
    │   Cache miss? → Query Tweet Store → populate cache
    │
    ├── Step 6: Hydrate user profiles (batch fetch)
    │   MGET user:uid1 user:uid2 ...
    │
    ├── Step 7: (Optional) Apply ranking model
    │   ML model scores tweets by predicted engagement
    │   Re-sort by score instead of chronological
    │
    └── Return JSON response with tweets, user objects, cursor
        Total latency target: <200ms (p99)

Failure Handling & Reliability

Key Failure Scenarios

FailureImpactMitigation
Redis timeline cache node diesAffected users see empty/stale timelineReplica takes over. Cold-start rebuild from Tweet Store + Fan-out replay from Kafka.
Kafka broker goes downFan-out, search indexing delayedKafka replication factor 3. Consumers resume from last committed offset. No data loss.
Tweet Store shard unreachableWrites fail for affected partitionCircuit breaker returns 503. Retries with exponential backoff. Cross-region failover.
Elasticsearch cluster degradedSearch latency increases or partial resultsReduce replica count, shed load to Earlybird-only results. Graceful degradation.
Fan-out service backlogTimelines are stale (tweets appear late)Backpressure via Kafka lag monitoring. Scale fan-out workers. Celebrity threshold lowered temporarily.
Snowflake worker clock skewPotential ID collision or ordering issuesWorker refuses to generate IDs if clock goes backward. Alert + NTP resync.

Graceful Degradation Strategy

When the system is under extreme load (e.g., Super Bowl, breaking news), degrade gracefully instead of failing entirely:

  1. Level 1 — Disable non-essential features: Turn off “Who to follow” suggestions, personalized trends, and ad injection.
  2. Level 2 — Reduce timeline freshness: Serve cached timelines even if slightly stale. Skip celebrity tweet merge.
  3. Level 3 — Switch to pull-only: Temporarily disable fan-out on write entirely. All timelines served via fan-out on read (higher latency but lower write load).
  4. Level 4 — Read-only mode: Stop accepting new tweets. Serve existing content only. Nuclear option for survival.

Summary & Key Takeaways

Designing Twitter at scale reveals several fundamental distributed systems patterns:

Design DecisionPatternWhy It Matters
Hybrid fan-outPush for normal, pull for celebritiesBalances write amplification vs read latency
Snowflake IDsDistributed ID generation without coordinationTime-sortable, globally unique, no bottleneck
Timeline cache in RedisPre-computed materialized viewsTrades storage for sub-200ms reads
Two-tier searchEarlybird (real-time) + ES (historical)Freshness for trending, depth for historical
Trending via accelerationSliding window with baseline comparisonDetects spikes, not just high volume
Kafka as event busAsync, decoupled event processingOne write triggers fan-out, search, trending, notifications
Sharded countersDistributed counting for hot keysAvoids hot rows for viral content

The core lesson: there is no single “best” approach. Fan-out on write is perfect for 99% of users but catastrophic for celebrities. Fan-out on read works universally but can’t meet latency targets at scale. The hybrid approach acknowledges that different users have fundamentally different characteristics and optimizes accordingly.