Design: Twitter
Requirements & Scale
Twitter (now X) is one of the most demanding real-time platforms on the internet. It combines the write-heavy nature of a messaging system with the read-heavy nature of a news feed, all under strict latency constraints. Before we design anything, let’s crystallize the requirements.
Functional Requirements
- Post tweets: Users create text posts up to 280 characters, optionally with images, videos, or links.
- Follow users: Users follow/unfollow other users. The follow relationship is asymmetric — following someone doesn’t mean they follow you back.
- Home timeline: Display a reverse-chronological (with ranking) feed of tweets from users you follow.
- User timeline: Display all tweets by a specific user.
- Search: Full-text real-time search across all tweets.
- Trending topics: Surface the most popular hashtags and topics in real time, globally and regionally.
- Engagement: Like, retweet, reply, quote-tweet.
- Notifications: Mentions, retweets, likes, and new follower alerts.
Non-Functional Requirements
- Availability: 99.99% uptime — Twitter is a global communications platform.
- Latency: Home timeline loads in <200ms. Tweet posting confirms in <500ms.
- Consistency: Eventual consistency is acceptable for timelines (a few seconds delay is fine). Strong consistency for tweet creation and follow/unfollow.
- Durability: Zero tweet loss — tweets are permanent records.
Scale Estimation
The numbers that drive every design decision:
| Metric | Value | Implications |
|---|---|---|
| Daily Active Users (DAU) | 400M | Massive read load on timelines |
| New tweets per day | 500M | ~5,800 tweets/sec average, 15K+ peak |
| Average follows per user | 200 | Each tweet fans out to ~200 timelines |
| Timeline reads per day | 100B | ~1.2M reads/sec — read:write ratio of 200:1 |
| Average tweet size | ~300 bytes (text + metadata) | ~150 GB/day of tweet text alone |
| Media attachments | ~30% of tweets have media | ~15 TB/day of images and video |
| Search queries per day | 1B | ~12K queries/sec, real-time indexing needed |
| Celebrity accounts (>1M followers) | ~50K | Fan-out on write is infeasible for these |
High-Level Architecture
The system decomposes into several core services, each scaling independently:
┌─────────────────────┐
│ API Gateway / │
Mobile / Web Clients ──▶ │ Load Balancer │
└──────┬──────────────┘
│
┌──────────────────────┼──────────────────────┐
▼ ▼ ▼
┌──────────────┐ ┌──────────────────┐ ┌──────────────┐
│ Tweet Service │ │ Timeline Service │ │ User Service │
│ (write path) │ │ (read path) │ │ (profiles, │
│ │ │ │ │ auth, follow) │
└──────┬───────┘ └────────┬─────────┘ └──────┬───────┘
│ │ │
▼ ▼ ▼
┌──────────────┐ ┌──────────────────┐ ┌──────────────┐
│ Fan-out │ │ Timeline Cache │ │ Social Graph │
│ Service │ │ (Redis Cluster) │ │ (Graph DB / │
└──────┬───────┘ └──────────────────┘ │ Redis) │
│ └──────────────┘
▼
┌──────────────┐ ┌──────────────────┐ ┌──────────────┐
│ Tweet Store │ │ Search Service │ │ Media Service │
│ (MySQL/ │ │ (Elasticsearch) │ │ (S3 + CDN) │
│ Cassandra) │ └──────────────────┘ └──────────────┘
└──────────────┘
│ ┌──────────────────┐ ┌──────────────┐
└──────────▶ │ Trending Service │ │ Notification │
│ (Storm/Flink) │ │ Service │
└──────────────────┘ └──────────────┘
Each service is independently deployable, horizontally scalable, and communicates via a combination of synchronous RPCs (for user-facing paths) and asynchronous message queues (for fan-out, search indexing, and analytics).
Tweet Service
The tweet service handles the critical write path — accepting a new tweet, persisting it, and kicking off the downstream pipeline (fan-out, search indexing, trending, notifications).
Tweet Creation Flow
- Client sends POST /api/tweets with text (max 280 chars), optional media IDs (pre-uploaded), and metadata (reply_to, quote_tweet_id).
- API Gateway authenticates the request (OAuth 2.0 bearer token), checks rate limits (300 tweets/3 hours per user), and routes to the Tweet Service.
- Tweet Service validates the payload:
- Text length ≤ 280 Unicode characters (not bytes — CJK characters count as one).
- Media IDs reference valid, uploaded media objects.
- Spam/abuse check via ML classifier (synchronous, <50ms).
- Generate Snowflake ID (more on this below) — this becomes the tweet’s globally unique, time-sortable identifier.
- Write to Tweet Store — synchronous write to the primary datastore.
- Publish to message queue (Kafka) — a single event triggers multiple downstream consumers:
- Fan-out Service (timeline delivery)
- Search Indexing Pipeline
- Trending Topics Pipeline
- Notification Service
- Analytics Pipeline
- Return 201 Created with the tweet object to the client.
Tweet Data Model
The tweets table is the heart of the system. It needs to support two access patterns: lookup by tweet_id (O(1)) and range scans by user_id + time (user timeline).
| Column | Type | Notes |
|---|---|---|
tweet_id | BIGINT (Snowflake ID) | Primary key, time-sortable |
user_id | BIGINT | Author, indexed for user timeline |
text | VARCHAR(1120) | 280 chars × 4 bytes (UTF-8 max) |
media_ids | JSON / ARRAY | References to media objects |
reply_to_id | BIGINT, nullable | Parent tweet for threading |
retweet_of_id | BIGINT, nullable | Original tweet for retweets |
quote_tweet_id | BIGINT, nullable | Quoted tweet reference |
like_count | INT | Denormalized counter |
retweet_count | INT | Denormalized counter |
reply_count | INT | Denormalized counter |
hashtags | JSON / ARRAY | Extracted hashtags for trending |
language | CHAR(2) | Detected language code |
geo | POINT, nullable | Optional geolocation |
created_at | TIMESTAMP | Redundant with Snowflake ID, but useful |
Partitioning Strategy
At 500M tweets/day, a single database is impossible. The tweet store is partitioned by tweet_id (specifically, by the Snowflake ID):
- Why not partition by user_id? Because celebrity users (like @barackobama with 130M followers) would create massive hot partitions. Partitioning by tweet_id distributes writes uniformly across shards since Snowflake IDs embed a random worker component.
- Shard count: ~4,096 logical shards, mapped to ~256 physical database nodes via consistent hashing. Each shard handles ~120K tweets/day — well within a single MySQL/Cassandra node’s write capacity.
- User timeline queries (
SELECT * FROM tweets WHERE user_id = ? ORDER BY tweet_id DESC LIMIT 20) require a scatter-gather across shards. This is acceptable because:- User timeline is a secondary access pattern (home timeline is primary).
- We maintain a secondary index table partitioned by user_id that maps to tweet_ids, avoiding full scatter-gather.
-- Secondary index table (partitioned by user_id)
CREATE TABLE user_tweets (
user_id BIGINT,
tweet_id BIGINT, -- Snowflake ID (also encodes timestamp)
PRIMARY KEY (user_id, tweet_id)
) WITH CLUSTERING ORDER BY (tweet_id DESC);
-- Query user timeline: single partition read
SELECT tweet_id FROM user_tweets
WHERE user_id = 12345
LIMIT 20;
Snowflake ID Generation
Twitter invented the Snowflake ID scheme to generate globally unique, roughly time-ordered 64-bit IDs without coordination between servers. This is critical because:
- Time-ordering: Tweets in a timeline are ordered by ID, which is equivalent to ordering by creation time. No need for a separate timestamp sort.
- No coordination: Each worker generates IDs independently — no central authority, no distributed locks, no database sequences.
- Compact: 64-bit integers are cache-friendly and fit in a single CPU register.
Bit Layout
┌─────────────────────────────────────────────────────────────────┐
│ 64-bit Snowflake ID │
├──────┬────────────────────────┬──────────┬──────────┬───────────┤
│ Sign │ Timestamp (ms) │ Datacenter│ Worker │ Sequence │
│ 1 bit│ 41 bits │ 5 bits │ 5 bits │ 12 bits │
├──────┼────────────────────────┼──────────┼──────────┼───────────┤
│ 0 │ ms since custom epoch │ 0-31 │ 0-31 │ 0-4095 │
│ │ (2010-11-04 01:42:54) │ │ │ │
└──────┴────────────────────────┴──────────┴──────────┴───────────┘
Key properties of this layout:
| Component | Bits | Range | Implication |
|---|---|---|---|
| Sign | 1 | Always 0 | Keeps IDs positive in signed 64-bit integers |
| Timestamp | 41 | ~69.7 years | IDs are time-sortable; epoch runs until ~2079 |
| Datacenter ID | 5 | 0–31 | Supports 32 datacenters |
| Worker ID | 5 | 0–31 | 32 workers per datacenter = 1,024 total workers |
| Sequence | 12 | 0–4095 | 4,096 IDs per millisecond per worker |
Total throughput: 1,024 workers × 4,096 IDs/ms = 4,194,304 IDs per millisecond — far exceeding Twitter’s peak of ~15K tweets/sec. The sequence counter resets every millisecond; if a worker exhausts 4,096 sequences within 1ms, it busy-waits until the next millisecond.
Implementation Sketch
class SnowflakeGenerator:
EPOCH = 1288834974657 # Custom epoch: 2010-11-04T01:42:54.657Z
def __init__(self, datacenter_id, worker_id):
assert 0 <= datacenter_id < 32
assert 0 <= worker_id < 32
self.datacenter_id = datacenter_id
self.worker_id = worker_id
self.sequence = 0
self.last_timestamp = -1
def next_id(self):
timestamp = current_time_ms()
if timestamp == self.last_timestamp:
self.sequence = (self.sequence + 1) & 0xFFF # 12-bit mask
if self.sequence == 0:
timestamp = self._wait_next_ms(timestamp)
else:
self.sequence = 0
self.last_timestamp = timestamp
return ((timestamp - self.EPOCH) << 22 |
(self.datacenter_id << 17) |
(self.worker_id << 12) |
self.sequence)
def _wait_next_ms(self, last_ts):
ts = current_time_ms()
while ts <= last_ts:
ts = current_time_ms()
return ts
tinker panic 0 to prevent large jumps.
Timeline Service — Hybrid Fan-Out
The home timeline is the single most important and most complex feature. Every time a user opens Twitter, they expect to see a fresh, personalized feed of tweets from people they follow — in under 200ms. This is where the hybrid fan-out approach comes in.
Three Approaches to Timeline Generation
Approach 1: Fan-Out on Read (Pull Model)
GET /home_timeline?user_id=123
1. Look up user 123's following list: [user_A, user_B, user_C, ...]
2. For each followed user, query their recent tweets
3. Merge and sort all tweets by timestamp
4. Return top N tweets
-- SQL equivalent (conceptual):
SELECT * FROM tweets
WHERE user_id IN (SELECT followee_id FROM follows WHERE follower_id = 123)
ORDER BY tweet_id DESC
LIMIT 20;
Problem: If a user follows 500 people, this is 500 database queries per timeline load. At 1.2M timeline reads/sec, that’s 600M database queries/sec — completely infeasible. Latency would be seconds, not milliseconds.
Approach 2: Fan-Out on Write (Push Model)
User A posts a tweet:
1. Look up A's followers: [user_123, user_456, user_789, ...]
2. For EACH follower, append tweet_id to their timeline cache
3. When user_123 requests home timeline, just read from their pre-built cache
-- Redis operation:
LPUSH timeline:123 <tweet_id>
LPUSH timeline:456 <tweet_id>
LPUSH timeline:789 <tweet_id>
LTRIM timeline:123 0 799 -- Keep only last 800 tweets
Problem: When a celebrity with 50M followers tweets, the fan-out service must write to 50 million timeline caches. At 300 bytes per entry, that’s 15GB of cache writes for a single tweet. With celebrities tweeting 5–20 times per day, the write amplification is catastrophic.
Approach 3: Hybrid (Twitter’s Actual Approach)
Twitter uses a hybrid model that combines the best of both approaches:
- Normal users (<10K followers): Fan-out on write. When they tweet, push tweet_id into each follower’s timeline cache. This covers ~99% of all tweets.
- Celebrity users (≥10K followers): Fan-out on read. Their tweets are not pushed to follower caches. Instead, when a user loads their timeline, the system merges:
- Pre-built cache (normal user tweets, pushed)
- Recent tweets from followed celebrities (pulled in real time)
Timeline Cache Design in Redis
The timeline cache is a Redis Cluster storing pre-computed timelines. This is the most latency-sensitive component in the entire system.
Data Structure
Key: timeline:{user_id}
Type: Sorted Set (ZSET)
Score: Snowflake ID (which is time-sortable)
Member: tweet_id (as string)
-- Fan-out write: celebrity tweets handled separately
ZADD timeline:123 1780012345678901234 "1780012345678901234"
-- Read home timeline (newest 20 tweets)
ZREVRANGEBYSCORE timeline:123 +inf -inf LIMIT 0 20
-- Trim to keep cache bounded (800 entries max)
ZREMRANGEBYRANK timeline:123 0 -801
Key design decisions for the Redis timeline cache:
| Decision | Choice | Rationale |
|---|---|---|
| Data structure | Sorted Set (ZSET) | O(log N) insert and range queries by score. Score = Snowflake ID gives time ordering for free. |
| What we store | Only tweet_ids, not full tweets | Keeps cache compact. Full tweet data fetched from Tweet Cache on read (batch MGET). A timeline of 800 tweet_ids is ~6.4KB vs ~240KB for full tweet objects. |
| Cache size per user | 800 tweet_ids | Covers ~2–3 days of content for an average user following 200 accounts. Beyond this, we fall back to database queries. |
| Eviction | Manual trim after each insert | ZREMRANGEBYRANK keeps exactly 800 entries. No reliance on Redis eviction policies. |
| Cluster topology | Redis Cluster, 1000+ nodes | 400M users × 6.4KB = ~2.5TB minimum. With replication factor 3 and headroom: ~10TB cluster. |
| Persistence | None (pure cache) | If a node dies, timelines are rebuilt from the Tweet Store. Cold-start penalty is acceptable for rare events. |
Timeline Read Path (Merged)
def get_home_timeline(user_id, count=20):
# Step 1: Get pre-built timeline from cache (normal user tweets)
cached_tweet_ids = redis.zrevrangebyscore(
f"timeline:{user_id}", "+inf", "-inf",
start=0, num=count * 2 # Over-fetch for merge headroom
)
# Step 2: Get followed celebrities
celebrity_followees = get_celebrity_followees(user_id)
# Step 3: Pull recent tweets from each celebrity (fan-out on read)
celebrity_tweets = []
for celeb_id in celebrity_followees:
tweets = redis.zrevrangebyscore(
f"user_tweets:{celeb_id}", "+inf", "-inf",
start=0, num=count
)
celebrity_tweets.extend(tweets)
# Step 4: Merge, sort by Snowflake ID (= chronological), take top N
all_tweet_ids = cached_tweet_ids + celebrity_tweets
all_tweet_ids.sort(reverse=True) # Descending = newest first
top_ids = all_tweet_ids[:count]
# Step 5: Hydrate tweet objects (batch fetch from Tweet Cache)
tweets = tweet_cache.mget(top_ids) # Redis MGET, <1ms
# Step 6: Hydrate user profiles (batch fetch)
user_ids = {t.user_id for t in tweets}
users = user_cache.mget(user_ids)
return enrich(tweets, users)
The average user follows 3–5 celebrities. So step 3 adds only 3–5 Redis lookups to the timeline read — negligible latency overhead.
▶ Tweet Delivery Pipeline
Social Graph Service
The social graph manages the follow/unfollow relationships between users. It answers three critical questions:
- Who does user X follow? (Following list — needed for fan-out on read of celebrity tweets)
- Who follows user X? (Follower list — needed for fan-out on write)
- Does user X follow user Y? (Follow check — needed for UI rendering and access control)
Data Model Options
Option A: Adjacency List in Redis
-- Who does user 123 follow?
Key: following:123 Type: SET
Members: {456, 789, 101, ...}
-- Who follows user 456?
Key: followers:456 Type: SET
Members: {123, 555, 888, ...}
-- Follow user
SADD following:123 456
SADD followers:456 123
-- Unfollow
SREM following:123 456
SREM followers:456 123
-- Check if following
SISMEMBER following:123 456 → 1 (yes) or 0 (no)
-- Follower count
SCARD followers:456 → 1,340,201
Pros: O(1) follow check, O(1) count, blazing fast. Cons: Memory-intensive. A user with 50M followers has a 50M-member set (~400MB). For celebrities, we store only the count in Redis and page through followers from the database for fan-out operations.
Option B: Graph Database (e.g., FlockDB, TAO)
Twitter originally built FlockDB, a distributed graph database optimized for adjacency-list queries. Facebook uses TAO for the same purpose. Key characteristics:
- Stores edges as
(source_id, edge_type, destination_id, timestamp) - Sharded by source_id for fast “who do I follow?” queries
- Reverse index sharded by destination_id for “who follows me?”
- Supports intersection queries: “which of my friends also follow user X?”
Recommended: Hybrid
Use Redis for hot-path queries (follow check, counts, following list for timeline merge) and a persistent store (MySQL or FlockDB-style) as the source of truth. Redis is populated on follow/unfollow and warmed on user login.
Follow/Unfollow Flow
POST /api/users/123/follow body: { target_user_id: 456 }
1. Validate: user 123 exists, user 456 exists, not already following
2. Write to persistent store:
INSERT INTO follows (follower_id, followee_id, created_at) VALUES (123, 456, NOW())
3. Update Redis (async via message queue for consistency):
SADD following:123 456
SADD followers:456 123
INCR follower_count:456
INCR following_count:123
4. If user 456 is a normal user (<10K followers):
→ Backfill: push 456's recent tweets into 123's timeline cache
5. If user 456 is a celebrity:
→ Add 456 to 123's celebrity-followee set (for fan-out on read)
6. Send notification to user 456: "user 123 started following you"
Search Service
Twitter search must index 500M tweets per day and make them searchable within seconds of posting. This is one of the most demanding real-time search systems in the world.
Architecture: Earlybird + Elasticsearch
Twitter uses a search architecture called Earlybird, which is a custom real-time inverted index optimized for recency. Let’s build something similar:
Two-Tier Search
┌──────────────────────────────────────────────────┐
│ Search Query Flow │
│ │
│ Query ──▶ Blender (merges results from both) │
│ │ │ │
│ ▼ ▼ │
│ ┌─────────────────┐ ┌─────────────────────┐ │
│ │ Earlybird │ │ Full Index │ │
│ │ (Real-Time) │ │ (Elasticsearch) │ │
│ │ │ │ │ │
│ │ Last ~10 sec of │ │ Historical tweets │ │
│ │ tweets, in-memory│ │ (days/weeks/months) │ │
│ │ inverted index │ │ sharded by time │ │
│ └─────────────────┘ └─────────────────────┘ │
└──────────────────────────────────────────────────┘
Earlybird: Real-Time Inverted Index
Earlybird is an in-memory, append-only inverted index that processes the most recent tweets (last ~10 seconds). It’s designed for the common case: users searching for breaking news and trending topics want the absolute freshest results.
class EarlybirdIndex:
def __init__(self):
self.inverted_index = {} # token → sorted list of tweet_ids
self.tweet_store = {} # tweet_id → tweet metadata
def index_tweet(self, tweet):
tokens = tokenize(tweet.text) # Lowercase, remove stopwords, stem
for token in tokens:
if token not in self.inverted_index:
self.inverted_index[token] = []
self.inverted_index[token].append(tweet.tweet_id)
self.tweet_store[tweet.tweet_id] = tweet
def search(self, query, limit=20):
tokens = tokenize(query)
if len(tokens) == 1:
posting_list = self.inverted_index.get(tokens[0], [])
else:
# Intersect posting lists for multi-term queries
sets = [set(self.inverted_index.get(t, [])) for t in tokens]
posting_list = sorted(set.intersection(*sets), reverse=True)
return posting_list[:limit]
def gc(self, max_age_seconds=10):
"""Evict tweets older than max_age_seconds to keep memory bounded."""
cutoff_id = snowflake_id_from_timestamp(now() - max_age_seconds)
# Remove from inverted index and tweet store
...
Earlybird runs on multiple shards (partitioned by hash of tweet_id) with each shard handling ~500 tweets/sec of indexing throughput. A query fans out to all Earlybird shards in parallel, and results are merged by the Blender service.
Full Index: Elasticsearch Cluster
Tweets older than ~10 seconds flow into Elasticsearch for long-term searchability:
- Index sharding: Time-based indices (e.g.,
tweets-2026-04-15) — one index per day. Older indices are merged and moved to cheaper storage. - Mapping: Text field analyzed with custom Twitter tokenizer (handles @mentions, #hashtags, URLs, emojis). Keyword fields for exact-match filters (language, user_id).
- Replication: 2 replicas per shard for read throughput and fault tolerance.
- Cluster size: ~2,000 nodes for the full tweet index. Hot nodes (SSDs) for recent indices, warm nodes (HDDs) for historical data.
Indexing Pipeline
Tweet posted
→ Kafka topic: "new-tweets"
→ Earlybird Consumer: index in-memory (within 1 second)
→ Elasticsearch Consumer: bulk index (within 5-10 seconds)
- Tokenize text (custom analyzer: @mentions, #hashtags, URLs)
- Extract entities (people, places, events)
- Detect language
- Compute engagement signals (initial = 0, updated periodically)
- Write to daily ES index with tweet_id as document ID
▶ Real-Time Search Indexing
Trending Topics
Trending topics surface what the world is talking about right now. The algorithm must be real-time (detect trends within minutes), noise-resistant (filter spam and evergreen topics), and geographically aware.
Trending Algorithm: Sliding Window Frequency Count
The core algorithm uses a sliding window approach over hashtag frequencies, detecting acceleration rather than absolute volume:
For each hashtag H in the last 5 minutes:
current_count = count(H, window=last_5_min)
baseline_count = count(H, window=last_24_hours) / 288 # Normalized to 5-min
trend_score = (current_count - baseline_count) / max(baseline_count, 1)
Trending if:
trend_score > THRESHOLD (e.g., 3.0) AND
current_count > MIN_VOLUME (e.g., 500) AND
NOT in spam_blocklist AND
NOT evergreen (e.g., "love", "good morning")
This means a hashtag trending isn’t about being the most used — it’s about having the highest acceleration. #Love might appear in 100K tweets/5min, but if that’s its normal rate, it doesn’t trend. #SuperBowl going from 100 to 50K in 5 minutes is a trend.
Implementation with Stream Processing
┌──────────┐ ┌──────────────┐ ┌──────────────────┐ ┌────────────┐
│ Kafka │ ──▶ │ Flink/Storm │ ──▶ │ Trend Detector │ ──▶ │ Trending │
│ new- │ │ Hashtag │ │ (sliding window │ │ Cache │
│ tweets │ │ Extractor │ │ + scoring) │ │ (Redis) │
└──────────┘ └──────────────┘ └──────────────────┘ └────────────┘
│
▼
┌──────────────┐
│ Spam Filter │
│ (ML model + │
│ blocklist) │
└──────────────┘
Multi-Level Trending
- Global trends: Aggregated across all tweets worldwide.
- Country trends: Filtered by user’s country (from profile or IP geolocation).
- City trends: Further localized for major metros. Requires geo-tagged tweets or location inference from user profiles.
- Personalized trends: Weighted by the user’s interests and following graph. If you follow many sports accounts, sports trends rank higher.
Noise & Spam Filtering
Without filtering, trending topics would be dominated by spam bots and coordinated manipulation campaigns. Multiple layers of defense:
- Velocity check: If a hashtag goes from 0 to 10K in under 60 seconds, it’s likely bot-driven. Legitimate trends build over minutes, not seconds.
- Account diversity: Count unique accounts, not just tweet volume. 10K tweets from 100 accounts is suspicious; 10K tweets from 8K accounts is organic.
- Account age & quality: Weigh tweets from established accounts higher. New accounts (<7 days) and accounts with spam signals get reduced weight.
- Content analysis: ML classifier detects coordinated inauthentic behavior — similar text patterns, same timing, network graph clustering.
- Evergreen blocklist: Common everyday phrases (#love, #goodmorning, #tbt) are excluded from trending even if they spike.
Media Service
~30% of tweets contain images or video. The media pipeline must handle high-throughput uploads, transcoding, and low-latency delivery.
Upload Flow
1. Client requests upload URL:
POST /api/media/upload/init → returns { media_id, upload_url (S3 presigned) }
2. Client uploads directly to S3 (bypasses our servers entirely):
PUT <upload_url> body: <binary image/video data>
3. S3 triggers Lambda/worker for post-processing:
- Images: resize to standard dimensions (150x150 thumb, 680px medium, original)
- Video: transcode to multiple bitrates (360p, 720p, 1080p) via FFmpeg
- Generate blurhash placeholder for progressive loading
- Store all variants back in S3
4. Media service marks media_id as "ready"
5. Client includes media_id in tweet creation:
POST /api/tweets body: { text: "...", media_ids: ["abc123"] }
Delivery via CDN
- CDN: CloudFront / Akamai with edge caches in 200+ cities. Popular images are served from edge with <10ms latency.
- URL structure:
https://pbs.twimg.com/media/{media_id}?format=jpg&name=medium - Cache hit ratio: ~95% for images (popular tweets get millions of views). Video caching is lower (~70%) due to size.
- Cost optimization: Cold media (>30 days, low access) moved to S3 Infrequent Access tier. Glacier for media older than 1 year.
Notification Service
Notifications keep users engaged by alerting them to interactions: mentions, retweets, likes, replies, and new followers.
Notification Types & Priorities
| Type | Trigger | Priority | Delivery |
|---|---|---|---|
| Mention (@user) | Tweet text parsing | High | Push + In-app |
| Reply | Tweet with reply_to_id | High | Push + In-app |
| Retweet | Retweet action | Medium | Push + In-app |
| Like | Like action | Low | In-app only (batched) |
| New follower | Follow action | Low | In-app (batched) |
| Trending topic | Personalized trend match | Low | Push (opt-in) |
Architecture
Event Sources (Kafka topics)
│
▼
Notification Router
├── Priority queue (high/medium/low)
├── Deduplication (don't notify same event twice)
├── User preference check (muted? notification settings?)
└── Rate limiting (max 50 push notifications/hour per user)
│
├──▶ Push Notification Service (APNs / FCM)
├──▶ In-App Notification Store (Cassandra)
└──▶ Email Digest Service (batched, daily/weekly)
Celebrity notification problem: When a tweet by @KatyPerry (100M+ followers) gets liked, we don’t send 100M notifications. Instead, likes are aggregated: “@user1, @user2, and 47,382 others liked your tweet.” The aggregation happens in the notification store, not in delivery.
Rate Limiting
Rate limiting protects the system from abuse, runaway clients, and denial-of-service attacks. Twitter applies limits at multiple layers:
Rate Limit Tiers
| Endpoint | Limit | Window | Rationale |
|---|---|---|---|
| POST /tweets | 300 tweets | 3 hours | Prevents spam bots from flooding |
| POST /likes | 1,000 likes | 24 hours | Prevents artificial engagement inflation |
| POST /follows | 400 follows | 24 hours | Prevents mass follow-unfollow schemes |
| GET /home_timeline | 150 requests | 15 minutes | Prevents scraping, protects Redis cache |
| GET /search | 180 requests | 15 minutes | Protects Elasticsearch cluster |
| App-level (OAuth) | 300 requests | 15 minutes | Per-app rate limit across all users |
Implementation
Rate limiting is implemented at the API Gateway using a sliding window counter backed by Redis:
-- Sliding window rate limit check (Lua script for atomicity)
local key = KEYS[1] -- e.g., "ratelimit:tweets:user:123"
local window = tonumber(ARGV[1]) -- e.g., 10800 (3 hours in seconds)
local max_requests = tonumber(ARGV[2]) -- e.g., 300
local now = tonumber(ARGV[3]) -- current timestamp
-- Remove entries outside the window
redis.call('ZREMRANGEBYSCORE', key, 0, now - window)
-- Count requests in current window
local count = redis.call('ZCARD', key)
if count < max_requests then
-- Allow: add this request
redis.call('ZADD', key, now, now .. ':' .. math.random(1000000))
redis.call('EXPIRE', key, window)
return {1, max_requests - count - 1} -- allowed, remaining
else
return {0, 0} -- denied, 0 remaining
end
Rate limit responses include standard headers:
HTTP/1.1 429 Too Many Requests
X-Rate-Limit-Limit: 300
X-Rate-Limit-Remaining: 0
X-Rate-Limit-Reset: 1713456789
Retry-After: 1823
Celebrity Tweet Handling
Celebrities are the hardest scaling problem on Twitter. An account with 50M followers breaks the fan-out model. Here’s how the system handles it end-to-end:
Dynamic Celebrity Classification
User Classification Logic:
if follower_count < 10,000:
user_tier = "normal" → Fan-out on write
elif follower_count < 1,000,000:
user_tier = "popular" → Fan-out on write with async batching
else:
user_tier = "celebrity" → Fan-out on read only
-- Classification is updated hourly via a background job
-- Tier changes trigger timeline cache migration:
-- normal → celebrity: stop pushing, add to celebrity index
-- celebrity → normal: start pushing, backfill timelines
Celebrity Tweet Cache
Celebrity tweets are stored in a dedicated cache for fast retrieval during timeline merge:
Key: celebrity_tweets:{user_id}
Type: Sorted Set (ZSET)
Score: Snowflake ID
Member: tweet_id
-- Elon Musk tweets → stored here, NOT pushed to 150M timeline caches
ZADD celebrity_tweets:44196397 1780099887766554433 "1780099887766554433"
-- When follower 123 loads timeline:
-- 1. Read their pre-built cache (normal tweets)
-- 2. For each celebrity they follow, ZREVRANGE celebrity_tweets:{celeb_id}
-- 3. Merge all results by Snowflake ID
This trades a small read-time cost (~3–5 extra Redis calls per timeline read) for eliminating millions of cache writes per celebrity tweet.
Engagement Counts at Scale
A viral celebrity tweet can accumulate millions of likes/retweets per minute. Updating the like_count column directly would create a hot row. Instead:
- Count buffering: Likes are accumulated in Redis (
INCR tweet_likes:<tweet_id>) and flushed to the database in batches every 30 seconds. - Approximate display: The UI shows “1.2M likes” (approximate) rather than “1,234,567 likes” (exact), allowing eventual consistency.
- Sharded counters: For extremely hot tweets, the counter is sharded across multiple Redis keys (
tweet_likes:<tweet_id>:shard_0through:shard_15) and summed on read.
User Timeline
Unlike the home timeline, the user timeline is straightforward: it shows all tweets by a single user, in reverse chronological order. This is a simpler problem:
GET /api/users/456/tweets?cursor=<last_tweet_id>&count=20
Option A: Query the secondary index table
SELECT tweet_id FROM user_tweets
WHERE user_id = 456
AND tweet_id < <cursor>
ORDER BY tweet_id DESC
LIMIT 20;
Option B: Redis sorted set (for hot profiles)
ZREVRANGEBYSCORE user_tweets:456 (<cursor> -inf LIMIT 0 20
For frequently accessed profiles (celebrities, trending accounts), the user timeline is cached in a Redis sorted set identical to the celebrity tweet cache. For long-tail profiles, the database secondary index is sufficient.
Infrastructure & Data Flow Summary
Kafka Topic Architecture
Kafka is the nervous system that connects all services. Every significant event flows through Kafka for decoupled, asynchronous processing:
Topic: new-tweets (partitioned by user_id, 256 partitions)
Consumers: fan-out-service, search-indexer, trending-pipeline,
notification-router, analytics-pipeline
Topic: engagements (partitioned by tweet_id, 128 partitions)
Consumers: counter-service, notification-router, ranking-pipeline
Topic: social-graph (partitioned by user_id, 64 partitions)
Consumers: timeline-backfill, notification-router, recommendation-engine
Topic: user-activity (partitioned by user_id, 128 partitions)
Consumers: analytics-pipeline, ad-targeting, personalization
Complete Write Path: Posting a Tweet
User taps "Tweet" on mobile app
│
▼
API Gateway (rate limit check, auth)
│
▼
Tweet Service
├── Generate Snowflake ID
├── Validate text (280 char limit, spam check)
├── Write to Tweet Store (MySQL/Cassandra)
├── Write to user_tweets secondary index
├── Update user's tweet_count
└── Publish to Kafka "new-tweets" topic
│
├──▶ Fan-out Service
│ ├── Check user tier (normal/popular/celebrity)
│ ├── If normal: get follower list, ZADD to each timeline cache
│ ├── If celebrity: ZADD to celebrity_tweets:{user_id} only
│ └── Async, batched, with backpressure
│
├──▶ Search Indexer
│ ├── Earlybird: index in-memory (real-time, <1 sec)
│ └── Elasticsearch: bulk index (near-real-time, <10 sec)
│
├──▶ Trending Pipeline (Flink)
│ ├── Extract hashtags
│ ├── Update sliding window counters
│ └── Recompute trend scores every 30 seconds
│
├──▶ Notification Router
│ ├── Parse @mentions → notify mentioned users
│ ├── If reply → notify parent tweet author
│ └── Push via APNs/FCM
│
└──▶ Analytics Pipeline
├── Tweet volume metrics
├── User engagement tracking
└── Ad impression attribution
Complete Read Path: Loading Home Timeline
User opens Twitter app
│
▼
API Gateway → Timeline Service
│
├── Step 1: Read pre-built timeline from Redis
│ ZREVRANGEBYSCORE timeline:123 +inf -inf LIMIT 0 40
│ Result: [tweet_id_1, tweet_id_2, ..., tweet_id_40]
│
├── Step 2: Get celebrity followees
│ SMEMBERS celebrity_followees:123
│ Result: [celeb_A, celeb_B, celeb_C]
│
├── Step 3: Pull celebrity tweets
│ For each celeb: ZREVRANGEBYSCORE celebrity_tweets:{celeb} +inf -inf LIMIT 0 20
│ Result: [celeb_tweet_1, celeb_tweet_2, ...]
│
├── Step 4: Merge & sort all tweet_ids by Snowflake ID (descending)
│ Take top 20
│
├── Step 5: Hydrate tweets (batch fetch)
│ MGET tweet:id1 tweet:id2 ... tweet:id20
│ Cache miss? → Query Tweet Store → populate cache
│
├── Step 6: Hydrate user profiles (batch fetch)
│ MGET user:uid1 user:uid2 ...
│
├── Step 7: (Optional) Apply ranking model
│ ML model scores tweets by predicted engagement
│ Re-sort by score instead of chronological
│
└── Return JSON response with tweets, user objects, cursor
Total latency target: <200ms (p99)
Failure Handling & Reliability
Key Failure Scenarios
| Failure | Impact | Mitigation |
|---|---|---|
| Redis timeline cache node dies | Affected users see empty/stale timeline | Replica takes over. Cold-start rebuild from Tweet Store + Fan-out replay from Kafka. |
| Kafka broker goes down | Fan-out, search indexing delayed | Kafka replication factor 3. Consumers resume from last committed offset. No data loss. |
| Tweet Store shard unreachable | Writes fail for affected partition | Circuit breaker returns 503. Retries with exponential backoff. Cross-region failover. |
| Elasticsearch cluster degraded | Search latency increases or partial results | Reduce replica count, shed load to Earlybird-only results. Graceful degradation. |
| Fan-out service backlog | Timelines are stale (tweets appear late) | Backpressure via Kafka lag monitoring. Scale fan-out workers. Celebrity threshold lowered temporarily. |
| Snowflake worker clock skew | Potential ID collision or ordering issues | Worker refuses to generate IDs if clock goes backward. Alert + NTP resync. |
Graceful Degradation Strategy
When the system is under extreme load (e.g., Super Bowl, breaking news), degrade gracefully instead of failing entirely:
- Level 1 — Disable non-essential features: Turn off “Who to follow” suggestions, personalized trends, and ad injection.
- Level 2 — Reduce timeline freshness: Serve cached timelines even if slightly stale. Skip celebrity tweet merge.
- Level 3 — Switch to pull-only: Temporarily disable fan-out on write entirely. All timelines served via fan-out on read (higher latency but lower write load).
- Level 4 — Read-only mode: Stop accepting new tweets. Serve existing content only. Nuclear option for survival.
Summary & Key Takeaways
Designing Twitter at scale reveals several fundamental distributed systems patterns:
| Design Decision | Pattern | Why It Matters |
|---|---|---|
| Hybrid fan-out | Push for normal, pull for celebrities | Balances write amplification vs read latency |
| Snowflake IDs | Distributed ID generation without coordination | Time-sortable, globally unique, no bottleneck |
| Timeline cache in Redis | Pre-computed materialized views | Trades storage for sub-200ms reads |
| Two-tier search | Earlybird (real-time) + ES (historical) | Freshness for trending, depth for historical |
| Trending via acceleration | Sliding window with baseline comparison | Detects spikes, not just high volume |
| Kafka as event bus | Async, decoupled event processing | One write triggers fan-out, search, trending, notifications |
| Sharded counters | Distributed counting for hot keys | Avoids hot rows for viral content |
The core lesson: there is no single “best” approach. Fan-out on write is perfect for 99% of users but catastrophic for celebrities. Fan-out on read works universally but can’t meet latency targets at scale. The hybrid approach acknowledges that different users have fundamentally different characteristics and optimizes accordingly.