Design: Chat System (WhatsApp)
The chat system — specifically a WhatsApp-scale messenger — is the most frequently asked system design interview question. It tests your understanding of real-time communication, message ordering, delivery guarantees, presence systems, and storage at scale. WhatsApp serves over 2 billion users with 100 billion messages per day. In this post, we design a system that handles 50 million daily active users (DAU) with sub-second message delivery.
What makes this problem fascinating is that it touches nearly every pillar of distributed systems: persistent connections (WebSockets), reliable message delivery (at-least-once semantics), global ordering (vector clocks), efficient storage (wide-column databases), end-to-end encryption (Signal Protocol), and real-time presence tracking (heartbeat + pub/sub). Let's build it from scratch.
Requirements
Functional Requirements
- 1-on-1 messaging — send and receive text messages between two users in real-time
- Group chat — create groups of up to 500 members, broadcast messages to all members
- Online/offline status — show whether a user is currently online or their "last seen" time
- Read receipts — single tick (sent), double tick (delivered), blue tick (read)
- Media sharing — images, videos, audio, and documents up to 100 MB
- Message history — persistent storage, searchable chat history
- Multi-device sync — messages available across phone, tablet, and desktop
- Push notifications — notify offline users of new messages
Non-Functional Requirements
- Low latency — messages delivered in < 500 ms for online recipients
- High availability — 99.99% uptime (chat is mission-critical)
- Consistency — eventual consistency acceptable, but messages must never be lost
- Ordering — messages within a single chat must appear in correct order
- Security — end-to-end encryption for all messages
Scale Estimates
| Metric | Value |
|---|---|
| Daily Active Users (DAU) | 50 million |
| Messages per user per day | ~40 |
| Total messages per day | 2 billion |
| Messages per second (avg) | ~23,000 |
| Peak messages per second | ~70,000 (3× average) |
| Concurrent WebSocket connections | ~15 million (30% of DAU online at once) |
| Average message size | ~200 bytes (text) |
| Storage per day (text only) | ~400 GB |
Communication Protocol
The choice of communication protocol is the single most important architectural decision for a chat system. Let's evaluate the options:
Option 1: HTTP Polling
The client periodically sends HTTP requests asking "any new messages?" This is the simplest approach but is incredibly wasteful:
- If you poll every second, 15M clients generate 15 million requests/second — most with empty responses
- Each HTTP request has ~200 bytes of headers overhead
- Average latency: half the polling interval (e.g., 500 ms for 1-second polling)
Verdict: Unacceptable at scale. Wastes bandwidth and server resources.
Option 2: Long Polling
The client opens an HTTP connection and the server holds it open until a new message arrives (or a timeout, typically 30 seconds). Better than polling, but still problematic:
- Each "held" connection ties up a server thread/socket
- If a message arrives right after a timeout, the user waits for the next connection
- Load balancers may route subsequent requests to a different server, losing context
- The sender doesn't know which server holds the recipient's long-poll connection
Verdict: Better, but still not ideal for real-time bidirectional communication.
Option 3: Server-Sent Events (SSE)
A unidirectional server-to-client stream over HTTP. Good for pushing updates, but:
- Only one-way (server → client); sending messages still requires separate HTTP POSTs
- Limited to text data (no binary)
- Max 6 connections per domain in HTTP/1.1 browsers
Verdict: Suitable for notifications, not for full-duplex chat.
Option 4: WebSocket ✓
WebSocket is the correct choice for a chat system. It provides:
- Full-duplex — both client and server can send data simultaneously
- Persistent connection — established once, reused for the session lifetime
- Low overhead — after the initial HTTP handshake, frames have only 2–6 bytes of overhead
- Real-time — sub-100 ms delivery once the connection is established
- Binary support — can transmit both text and binary data
The WebSocket Handshake
A WebSocket connection starts with an HTTP Upgrade request:
-- Client sends HTTP Upgrade request --
GET /chat HTTP/1.1
Host: chat.example.com
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ==
Sec-WebSocket-Version: 13
Authorization: Bearer <jwt_token>
-- Server responds with 101 Switching Protocols --
HTTP/1.1 101 Switching Protocols
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Accept: s3pPLMBiTxaQ9kYGzzhZRbK+xOo=
-- Now both sides communicate via WebSocket frames --
-- Frame header: 2 bytes (opcode + length) + optional 4-byte mask --
Hybrid Approach
In practice, we use both protocols:
| Operation | Protocol | Reason |
|---|---|---|
| Send/receive messages | WebSocket | Real-time, bidirectional |
| Typing indicators | WebSocket | Ephemeral, real-time |
| Read receipts | WebSocket | Low-latency status update |
| User registration / login | HTTPS | Stateless, request-response |
| Profile updates | HTTPS | Infrequent, not real-time |
| Media upload | HTTPS (+ S3 presigned) | Large files, multipart upload |
| Group management | HTTPS | CRUD operations |
High-Level Architecture
The system is composed of several specialized services, each handling a distinct concern:
Core Components
- API Gateway / Load Balancer — terminates TLS, authenticates users, routes WebSocket connections to chat servers using consistent hashing (by user_id) for sticky sessions
- Chat Servers (WebSocket Servers) — maintain persistent WebSocket connections with clients, route messages between users, handle message acknowledgments
- Presence Servers — track online/offline status using heartbeats, maintain a presence cache (Redis), publish status changes to subscribers
- Message Queue (Apache Kafka) — decouple message production from consumption, buffer messages during traffic spikes, ensure reliable delivery with at-least-once semantics
- Message Store (Apache Cassandra) — persistent storage for all chat messages, optimized for write-heavy workloads with time-series data patterns
- Media Storage (Amazon S3 + CDN) — store images, videos, audio, documents; generate presigned upload/download URLs
- Push Notification Service — send push notifications (APNs, FCM) to offline users, integrate with OS-level notification systems
- User Service — user profiles, contacts, block lists, settings (backed by MySQL/PostgreSQL)
- Group Service — group metadata, membership management, admin roles
- Session Service (Redis) — maps user_id → (chat_server_id, websocket_connection_id) so any chat server can look up where a recipient is connected
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ Client A │ │ Client B │ │ Client C │
│ (Mobile) │ │ (Desktop) │ │ (Mobile) │
└──────┬──────┘ └──────┬──────┘ └──────┬──────┘
│ WSS │ WSS │ WSS
▼ ▼ ▼
┌─────────────────────────────────────────────────────┐
│ Load Balancer (L4/L7) │
│ (sticky sessions via user_id hash) │
└───────┬──────────────┬──────────────┬───────────────┘
▼ ▼ ▼
┌───────────┐ ┌───────────┐ ┌───────────┐
│Chat Server│ │Chat Server│ │Chat Server│
│ #1 │ │ #2 │ │ #3 │
└─────┬─────┘ └─────┬─────┘ └─────┬─────┘
│ │ │
▼ ▼ ▼
┌─────────────────────────────────────────┐
│ Session Store (Redis) │
│ user_id → {server_id, conn_id} │
└─────────────────┬───────────────────────┘
│
┌───────────┼───────────┐
▼ ▼ ▼
┌───────────┐ ┌────────┐ ┌───────────┐
│ Message │ │Presence│ │ Push │
│ Queue │ │Service │ │Notification│
│ (Kafka) │ │(Redis) │ │ Service │
└─────┬─────┘ └────────┘ └───────────┘
▼
┌───────────┐ ┌───────────┐
│ Message │ │ Media │
│ Store │ │ Storage │
│(Cassandra) │ │ (S3+CDN) │
└───────────┘ └───────────┘
1-on-1 Messaging Flow
This is the core of the system. Let's trace the complete lifecycle of a single message from User A to User B:
Step-by-Step Flow (User B Online)
- User A types a message and taps send. The client generates a client_message_id (UUID) for idempotency.
- Client A sends via WebSocket to the chat server it's connected to (Chat Server #1). The frame contains:
{ "type": "message", "client_msg_id": "550e8400-e29b-41d4-a716-446655440000", "from": "user_A", "to": "user_B", "chat_id": "chat_AB", "content": "Hey, are you free tonight?", "content_type": "text", "timestamp": 1714000000000 } - Chat Server #1 receives the message, assigns a server-side message_id (monotonically increasing per chat, e.g., Snowflake ID), and immediately sends an ACK back to User A (single ✓):
{ "type": "ack", "client_msg_id": "550e8400-...", "server_msg_id": "msg_174200001", "status": "sent", "timestamp": 1714000000050 } - Chat Server #1 persists the message to Kafka (topic:
messages, partition key:chat_AB) for durable storage. A Kafka consumer writes it to Cassandra asynchronously. - Chat Server #1 looks up User B's connection in the Session Store (Redis):
GET session:user_B→{"server": "chat_server_2", "conn": "ws_889"} - Chat Server #1 forwards the message to Chat Server #2 via an internal service mesh (gRPC or direct TCP). Chat Server #2 finds the WebSocket connection for User B and delivers the message.
- User B's client receives the message and sends a delivery ACK back through Chat Server #2. This triggers a "delivered" status (double ✓✓) that propagates back to User A.
- User B opens the chat and reads the message. The client sends a read receipt. User A now sees a blue ✓✓.
Step-by-Step Flow (User B Offline)
- Steps 1–4 are identical: User A sends, Chat Server acknowledges and persists.
- Session Store lookup returns NULL — User B is not connected to any chat server.
- Chat Server publishes to the push notification service, which sends a push notification via APNs (iOS) or FCM (Android).
- The message remains in Cassandra, awaiting sync when User B comes online.
- User B opens the app → establishes a WebSocket connection → sends a sync request with the last known
server_msg_idfor each chat → Chat Server queries Cassandra for all messages with IDs greater than the last known → delivers them in bulk.
▶ 1-on-1 Message Flow
Watch a message travel from User A → Chat Server → User B (online), then see the offline flow with push notification and sync.
Message Deduplication
Networks are unreliable. A client may retry a send if it doesn't receive an ACK. The server uses the client_message_id (UUID) for idempotency:
-- On the chat server (pseudocode):
def handle_message(msg):
# Check if we already processed this client_msg_id
if redis.sismember(f"dedup:{msg.chat_id}", msg.client_msg_id):
# Already processed — resend the ACK but don't duplicate the message
return send_ack(msg.client_msg_id, existing_server_msg_id)
server_msg_id = snowflake.next_id()
kafka.produce("messages", key=msg.chat_id, value=msg)
redis.sadd(f"dedup:{msg.chat_id}", msg.client_msg_id)
redis.expire(f"dedup:{msg.chat_id}", 86400) # 24h TTL
return send_ack(msg.client_msg_id, server_msg_id)
Group Chat Design
Group messaging introduces a fan-out problem: one message must be delivered to potentially hundreds of recipients. There are two strategies:
Strategy 1: Fan-out on Write (Write-Heavy)
When a message is sent to a group, the server immediately copies it into every member's personal inbox (message queue). Each member's chat server then delivers from their inbox.
- Pro: Reads are fast — each user just reads from their own inbox
- Con: One message to a 500-member group creates 500 copies — write amplification
- Used by: WeChat (for small groups)
Strategy 2: Fan-out on Read (Read-Heavy)
The message is written once to the group's message store. When a member opens the group chat, they query the group's timeline.
- Pro: Only one write per message regardless of group size
- Con: Reads require joining group timeline with user's "last read" pointer
- Used by: Discord (for large servers)
Our Hybrid Approach
We use a hybrid fan-out — the message is stored once in the group's Cassandra table, but we also push a lightweight notification entry into each member's message queue (via Kafka) so their chat server knows to deliver in real-time:
-- Group message flow:
1. User A sends message to group_xyz
2. Chat Server writes message to Kafka topic "group_messages"
(partition key: group_xyz)
3. Kafka consumer writes to Cassandra: messages_by_group table
4. For each member in group_xyz (fetched from Group Service):
a. Look up member's session in Redis
b. If online → forward message via their chat server's WebSocket
c. If offline → enqueue push notification
5. Each receiving client sends delivery ACK per message
▶ Group Chat Fan-out
User A sends a message to a group of 5. Watch it fan out to each member's message queue and get delivered via WebSocket.
Group Message Ordering
Within a group, all messages are produced to the same Kafka partition (partitioned by group_id), which guarantees a total order. The Kafka consumer assigns monotonically increasing sequence numbers as it writes to Cassandra. Clients use these sequence numbers to display messages in the correct order.
Online Presence System
The presence system answers: "Is User X online right now?" This seems simple but is surprisingly tricky at scale — especially when a user has millions of contacts who all want real-time status updates.
Heartbeat Mechanism
Each connected client sends a heartbeat to the presence server every 30 seconds. If no heartbeat is received for 90 seconds (3 missed heartbeats), the user is marked offline:
-- Heartbeat handling (Presence Server):
def handle_heartbeat(user_id):
# Update last_active timestamp in Redis
redis.hset(f"presence:{user_id}", mapping={
"status": "online",
"last_active": current_timestamp(),
"server_id": this_server_id
})
# Set a TTL — if no heartbeat within 90s, key expires → offline
redis.expire(f"presence:{user_id}", 90)
def on_key_expired(user_id):
# Redis keyspace notification: key expired → user went offline
publish_status_change(user_id, "offline", last_active_time)
def on_websocket_close(user_id):
# Immediate offline detection when connection drops cleanly
redis.hset(f"presence:{user_id}", "status", "offline")
redis.hset(f"presence:{user_id}", "last_active", current_timestamp())
publish_status_change(user_id, "offline", current_timestamp())
Status Fan-out via Pub/Sub
When User A's status changes, all friends who have User A's chat open need to be notified. Naive approach: look up all of User A's contacts and push to each — this is O(friends) per status change and extremely expensive for popular users.
Better approach: subscription-based pub/sub:
- When User B opens a chat with User A, User B subscribes to User A's presence channel:
SUBSCRIBE presence:user_A - When User A's status changes, the presence server publishes:
PUBLISH presence:user_A {status: "online"} - Only actively interested users receive the update — no wasted fan-out
- When User B closes the chat or goes offline, the subscription is removed
For group chats, when User B opens a group, they subscribe to presence channels of all group members. The presence server uses Redis Pub/Sub for this.
▶ Presence System: Heartbeat & Status Updates
Watch a user come online, heartbeats maintain status, and friends subscribed to the channel receive real-time updates.
Handling Edge Cases
- Flapping — a user with a bad connection rapidly toggles online/offline. Solution: add a debounce window (e.g., don't publish "offline" until 30 seconds after the last heartbeat miss). Only publish "online" immediately.
- Multi-device — User has phone and desktop. "Online" if any device is connected; "offline" only when all devices disconnect. Maintain a per-device presence set:
SADD presence_devices:user_A "phone" "desktop" - Large groups — subscribing to 500 presence channels when opening a large group is expensive. Solution: only subscribe to presence for visible members (viewport-based), and batch presence queries for the rest:
MGET presence:user_1 presence:user_2 ...
Message Storage (Cassandra)
Why Cassandra for chat messages? It's the perfect fit:
- Write-optimized — uses an LSM-tree (Log-Structured Merge-tree), which converts random writes into sequential writes. Chat is extremely write-heavy.
- Time-series friendly — messages are naturally ordered by time, and Cassandra excels at range queries within a partition.
- Linearly scalable — add nodes to handle more data/traffic, no single point of failure.
- Tunable consistency — use
QUORUMwrites for durability,LOCAL_ONEreads for speed.
Schema Design
-- 1-on-1 messages: partitioned by chat_id, clustered by message_id (time-ordered)
CREATE TABLE messages_by_chat (
chat_id TEXT, -- deterministic: sorted(user_A, user_B)
message_id BIGINT, -- Snowflake ID (encodes timestamp + server + seq)
sender_id TEXT,
content BLOB, -- encrypted message body
content_type TEXT, -- 'text', 'image', 'video', 'audio', 'document'
media_url TEXT, -- S3 URL for media messages
thumbnail BLOB, -- base64 thumbnail for images/videos
status TEXT, -- 'sent', 'delivered', 'read'
created_at TIMESTAMP,
PRIMARY KEY (chat_id, message_id)
) WITH CLUSTERING ORDER BY (message_id DESC)
AND compaction = {'class': 'TimeWindowCompactionStrategy',
'compaction_window_unit': 'DAYS',
'compaction_window_size': 7}
AND gc_grace_seconds = 864000
AND default_time_to_live = 0;
-- Group messages: partitioned by group_id, clustered by message_id
CREATE TABLE messages_by_group (
group_id TEXT,
message_id BIGINT, -- Snowflake ID
sender_id TEXT,
content BLOB,
content_type TEXT,
media_url TEXT,
thumbnail BLOB,
created_at TIMESTAMP,
PRIMARY KEY (group_id, message_id)
) WITH CLUSTERING ORDER BY (message_id DESC)
AND compaction = {'class': 'TimeWindowCompactionStrategy',
'compaction_window_unit': 'DAYS',
'compaction_window_size': 7};
-- Per-user chat index: which chats does a user belong to, ordered by last activity
CREATE TABLE chats_by_user (
user_id TEXT,
last_message_at TIMESTAMP,
chat_id TEXT,
chat_type TEXT, -- 'dm' or 'group'
other_user_id TEXT, -- NULL for group chats
group_name TEXT, -- NULL for DMs
last_message_preview TEXT, -- first 100 chars of last message
unread_count INT,
PRIMARY KEY (user_id, last_message_at, chat_id)
) WITH CLUSTERING ORDER BY (last_message_at DESC, chat_id ASC);
-- Read receipts: track per-user read position in each chat
CREATE TABLE read_receipts (
chat_id TEXT,
user_id TEXT,
last_read_msg_id BIGINT, -- the highest message_id this user has read
read_at TIMESTAMP,
PRIMARY KEY (chat_id, user_id)
);
chat_id as partition key? All messages in a conversation live in the same Cassandra partition. This means fetching a page of chat history is a single-partition range query — extremely fast, O(log N) seek + sequential read. The trade-off: a very active chat could create a large partition. Mitigation: add a bucket column (e.g., monthly buckets) to split large chats across partitions.
Query Patterns
-- Fetch latest 50 messages in a chat (pagination)
SELECT * FROM messages_by_chat
WHERE chat_id = 'chat_AB'
ORDER BY message_id DESC
LIMIT 50;
-- Fetch messages after a sync point (for offline → online sync)
SELECT * FROM messages_by_chat
WHERE chat_id = 'chat_AB'
AND message_id > 174200001 -- last known message_id
ORDER BY message_id ASC;
-- Fetch user's chat list (inbox)
SELECT * FROM chats_by_user
WHERE user_id = 'user_A'
LIMIT 20;
-- Get unread count across all chats
SELECT chat_id, unread_count FROM chats_by_user
WHERE user_id = 'user_A' AND unread_count > 0;
Partition Sizing
Let's estimate partition sizes:
- Average message size (encrypted): ~500 bytes
- Active 1-on-1 chat: ~50 messages/day × 365 days = ~18,250 messages/year
- Partition size per year: 18,250 × 500 bytes ≈ 9 MB — well within Cassandra's recommended 100 MB per partition
- Very active group (100 messages/day): ~36,500 messages/year ≈ 18 MB — still fine
- Extremely active group (1,000 messages/day): ~180 MB/year — needs time-bucketed partitions
Message Ordering & Delivery Guarantees
Correct message ordering is critical for a chat application. Imagine seeing "Yes!" before "Do you want to grab dinner?" — the conversation becomes nonsensical.
Ordering Strategies
1. Client Timestamp (Unreliable)
Using the sender's device clock is tempting but fundamentally broken:
- Device clocks can be wrong (manual changes, timezone bugs, NTP drift)
- Two users' clocks may differ by seconds or minutes
- A message sent "later" could have an earlier timestamp
2. Server Timestamp (Better)
The chat server assigns a timestamp when it receives the message. Problems:
- Multiple chat servers have slightly different clocks (even with NTP, drift can be ~1 ms)
- If two messages arrive at different servers within the same millisecond, ordering is ambiguous
3. Snowflake ID (Our Choice) ✓
We use Twitter Snowflake-style IDs: 64-bit integers that embed a timestamp and guarantee global uniqueness and roughly chronological ordering:
┌──────────────────────────────────────────────────────────────────┐
│ 64-bit Snowflake ID │
├──────────┬────────────┬──────────────┬──────────────────────────┤
│ 1 bit │ 41 bits │ 10 bits │ 12 bits │
│ (sign) │ timestamp │ machine ID │ sequence number │
│ │ (ms since │ (1024 │ (4096 IDs per ms │
│ │ epoch) │ servers) │ per server) │
└──────────┴────────────┴──────────────┴──────────────────────────┘
-- 41 bits of timestamp: ~69 years from custom epoch
-- 10 bits of machine ID: supports 1024 chat servers
-- 12 bits of sequence: 4096 messages per millisecond per server
For within a single chat, we add an extra guarantee: all messages for a chat flow through the same Kafka partition (partitioned by chat_id), and the Kafka consumer assigns a per-chat sequence number. This gives us a total order within each conversation.
Delivery Guarantees
| Guarantee | How We Achieve It |
|---|---|
| At-least-once delivery | Client retries on ACK timeout + server deduplicates via client_msg_id |
| No message loss | Kafka persists messages to disk before ACKing (acks=all); Cassandra replicates to 3 nodes |
| Per-chat ordering | Single Kafka partition per chat + per-chat sequence number in Cassandra |
| Exactly-once semantics (effective) | Client dedup on server_msg_id + server dedup on client_msg_id |
Handling Out-of-Order Delivery
Even with all guarantees, messages can arrive out of order at the client (e.g., network jitter, reconnection). The client maintains a local buffer:
// Client-side ordering logic:
class MessageBuffer {
constructor(chatId) {
this.chatId = chatId;
this.expectedSeq = lastKnownSeq + 1;
this.buffer = new Map(); // seq → message
}
onMessageReceived(msg) {
if (msg.seq === this.expectedSeq) {
// In order — display immediately
this.display(msg);
this.expectedSeq++;
// Flush any buffered messages that are now in order
while (this.buffer.has(this.expectedSeq)) {
this.display(this.buffer.get(this.expectedSeq));
this.buffer.delete(this.expectedSeq);
this.expectedSeq++;
}
} else if (msg.seq > this.expectedSeq) {
// Out of order — buffer it, wait for missing messages
this.buffer.set(msg.seq, msg);
// If gap persists for > 2 seconds, request missing messages
this.scheduleGapFill(this.expectedSeq, msg.seq - 1);
}
// msg.seq < expectedSeq → duplicate, ignore
}
}
End-to-End Encryption
WhatsApp uses the Signal Protocol (developed by Open Whisper Systems) for end-to-end encryption. The server never sees plaintext messages — it only relays encrypted blobs.
Key Concepts
- Identity Key Pair — a long-term Curve25519 key pair generated on first install. The public key is uploaded to the server as the user's identity.
- Signed Pre-Key — a medium-term key pair, signed by the identity key. Rotated periodically (e.g., weekly).
- One-Time Pre-Keys — a batch of ephemeral key pairs uploaded to the server. Each is used only once for initial key exchange, then deleted.
- Session Key (Ratchet) — a symmetric key derived from the Diffie-Hellman exchange. Uses the Double Ratchet Algorithm to generate a new key for every single message — providing forward secrecy.
Initial Key Exchange (X3DH)
-- User A wants to message User B for the first time:
1. User A fetches User B's "key bundle" from the server:
- Identity Key (IKb)
- Signed Pre-Key (SPKb) + signature
- One-Time Pre-Key (OPKb)
2. User A verifies SPKb's signature using IKb
3. User A performs X3DH (Extended Triple Diffie-Hellman):
DH1 = DH(IKa, SPKb) -- A's identity × B's signed pre-key
DH2 = DH(EKa, IKb) -- A's ephemeral × B's identity
DH3 = DH(EKa, SPKb) -- A's ephemeral × B's signed pre-key
DH4 = DH(EKa, OPKb) -- A's ephemeral × B's one-time pre-key
Master Secret = KDF(DH1 || DH2 || DH3 || DH4)
4. Derive initial chain keys from Master Secret
5. Encrypt first message with derived key
6. Send: encrypted message + IKa (public) + EKa (public) + used OPKb ID
-- User B receives and performs the same DH calculations to derive
-- the same Master Secret, then decrypts the message.
Double Ratchet Algorithm
After the initial exchange, every message uses a new encryption key derived via two "ratchets":
- Symmetric ratchet (KDF chain) — each message advances a hash chain:
key_n+1 = HMAC(key_n, constant). Even if one key is compromised, past keys cannot be derived (forward secrecy). - Diffie-Hellman ratchet — periodically (on each message exchange), parties generate new DH key pairs and mix fresh DH output into the chain. This provides future secrecy (aka break-in recovery) — if a key is compromised, future keys become secure again after the next ratchet step.
Group Encryption
For group chats, WhatsApp uses the Sender Keys protocol:
- Each group member generates a Sender Key for the group
- The Sender Key is distributed to all group members via pairwise encrypted channels (using the 1-on-1 encryption above)
- Messages to the group are encrypted once with the sender's Sender Key (symmetric encryption)
- All members can decrypt using the sender's Sender Key — O(1) encryption per message instead of O(N)
- When a member leaves, all Sender Keys are regenerated and redistributed
Media Handling
Sending a photo, video, or document follows a different flow than text messages:
Upload Flow
- Client encrypts the media locally using a random AES-256 key
- Client requests a presigned S3 upload URL from the API server (via HTTPS):
POST /api/v1/media/upload-url { "content_type": "image/jpeg", "file_size": 2456789, "chat_id": "chat_AB" } Response: { "upload_url": "https://s3.amazonaws.com/chat-media/...", "media_id": "media_8a7f3c", "expires_in": 3600 } - Client uploads encrypted media directly to S3 (bypasses chat servers — no unnecessary load)
- Client generates a thumbnail (e.g., 100×100 JPEG for images, first-frame for videos)
- Client sends a message via WebSocket containing the media_id, encryption key (encrypted with the chat's session key), thumbnail, and metadata:
{ "type": "message", "content_type": "image", "media_id": "media_8a7f3c", "media_url": "https://cdn.example.com/media/media_8a7f3c", "encryption_key": "<base64_aes_key_encrypted_with_session_key>", "thumbnail": "<base64_blurred_thumbnail>", "file_size": 2456789, "dimensions": {"width": 1920, "height": 1080}, "caption": "Look at this sunset!" } - Recipient receives the message, displays the thumbnail immediately, downloads the full media from S3/CDN in the background, decrypts with the AES key
Media Processing Pipeline
After upload, a background pipeline processes the media:
- Virus scanning — ClamAV or similar scans the encrypted media (if scanning before encryption) or validates file headers
- Content moderation — for profile photos and group icons (not E2E encrypted content)
- Transcoding — generate multiple resolutions for video (360p, 720p, 1080p)
- CDN distribution — replicate to edge locations for fast download globally
- Expiration — media not accessed for 30 days is moved to Glacier/cold storage
Read Receipts & Delivery Status
The three-tick system is a core UX feature of chat apps:
| Status | Visual | Trigger |
|---|---|---|
| Sent | ✓ (single gray tick) | Server ACKs the message (persisted to Kafka) |
| Delivered | ✓✓ (double gray tick) | Recipient's device receives the message and sends delivery ACK |
| Read | ✓✓ (double blue tick) | Recipient opens the chat (scrolls message into viewport) |
Implementation
-- Delivery receipt (sent by recipient's device automatically):
{
"type": "receipt",
"receipt_type": "delivered",
"chat_id": "chat_AB",
"message_ids": [174200001, 174200002, 174200003],
"user_id": "user_B",
"timestamp": 1714000005000
}
-- Read receipt (sent when user opens the chat):
{
"type": "receipt",
"receipt_type": "read",
"chat_id": "chat_AB",
"up_to_msg_id": 174200003, -- "I've read everything up to this ID"
"user_id": "user_B",
"timestamp": 1714000010000
}
Note the optimization: read receipts use up_to_msg_id instead of listing every message ID. This means "I've read all messages up to and including this ID" — one receipt covers any number of messages.
Group Read Receipts
In groups, tracking read status is more complex. Each member's read position is stored independently:
-- Query: "Who has read message X in group_xyz?"
SELECT user_id, read_at FROM read_receipts
WHERE chat_id = 'group_xyz'
AND last_read_msg_id >= 174200050;
-- This is an expensive query for large groups, so:
-- 1. Cache frequently-accessed read receipt data in Redis
-- 2. Only compute detailed "seen by" lists on user request (tap on message)
-- 3. Show a simple count ("Read by 45 of 100") by default
Multi-Device Synchronization
Users expect to access their messages on multiple devices — phone, desktop app, and web. This creates a sync challenge: how do you keep all devices in sync without duplicating every message N times?
Sync Protocol
- Each device maintains a sync cursor — the
last_server_msg_idit has received for each chat - On connect/reconnect, the device sends its sync cursors to the server:
{ "type": "sync_request", "device_id": "phone_001", "cursors": { "chat_AB": 174200001, "chat_AC": 174100050, "group_xyz": 174150020 } } - The server computes deltas for each chat (messages with
server_msg_id > cursor) and sends them in batches - Real-time messages are delivered to all connected devices simultaneously — the Session Store maps
user_id → [{server, conn}, {server, conn}, ...]
Primary vs. Companion Devices
WhatsApp's original model uses a primary device (phone) with companion devices (desktop, web) that mirror through the phone. The newer multi-device architecture (WhatsApp Multi-Device) is fully independent:
- Each device has its own identity key pair and encrypts/decrypts independently
- Sending a message to User B actually sends N copies (one per device of User B), each encrypted with that device's session key
- Trade-off: more encryption overhead, but no dependency on a single primary device
Push Notifications
When a recipient is offline, we must deliver a push notification. The push notification service:
- Receives events from chat servers (via Kafka topic
push_notifications) - Looks up the user's device tokens (stored in a device registry)
- Formats the notification payload (sender name, message preview — if not E2E encrypted, or just "New message from X")
- Sends via APNs (Apple Push Notification service) for iOS or FCM (Firebase Cloud Messaging) for Android
- Handles token invalidation (device unregistered), rate limiting (don't spam), and notification collapsing (group multiple messages into one notification)
-- Push notification payload (FCM example):
{
"to": "<device_fcm_token>",
"notification": {
"title": "User A",
"body": "New message" // Can't show content due to E2E encryption
},
"data": {
"chat_id": "chat_AB",
"sender_id": "user_A",
"msg_count": 3,
"type": "new_message"
},
"android": {
"priority": "high" // Wake device from Doze mode
}
}
Scalability & Fault Tolerance
Chat Server Scaling
- Horizontal scaling — add more chat servers as connections grow. Each server handles ~500K concurrent WebSocket connections.
- Sticky sessions — the load balancer routes each user to the same chat server (consistent hashing on
user_id). On server failure, connections are re-established and the LB routes to a different server. - Graceful shutdown — before taking a server down for maintenance, drain connections: stop accepting new ones, send a "reconnect" frame to existing clients, wait for all to disconnect.
Kafka Scaling
- Partitioning — messages partitioned by
chat_id(1-on-1) orgroup_id(group). Start with 256 partitions, increase as needed. - Replication factor 3 — each partition replicated to 3 brokers for durability
- Consumer groups — multiple consumers in a group for parallel processing of Cassandra writes
Cassandra Scaling
- Replication factor 3 with
NetworkTopologyStrategyacross multiple data centers - Write consistency: QUORUM (2 of 3 nodes acknowledge) — strong durability
- Read consistency: LOCAL_ONE — fast reads from the nearest replica
- Compaction: TimeWindowCompactionStrategy — optimal for time-series data (chat messages)
Handling Chat Server Failures
Scenario: Chat Server #2 crashes while holding 500K connections
1. All 500K WebSocket connections drop immediately
2. Clients detect disconnect (TCP keepalive or missing heartbeat)
3. Clients reconnect with exponential backoff (1s, 2s, 4s, 8s...)
4. Load balancer routes to healthy servers (Chat Server #1, #3, ...)
5. New chat server registers the connection in Session Store (Redis)
6. Clients send sync requests to fetch missed messages
7. No messages are lost — they're all in Kafka/Cassandra
Recovery time: ~5-10 seconds for reconnection + sync
Interview Cheat Sheet
When asked "Design WhatsApp / a chat system" in an interview, here's the framework:
| Topic | Key Points |
|---|---|
| Protocol | WebSocket for real-time, HTTP for non-real-time. Explain the handshake. |
| Architecture | Chat servers (WS) + Session Store (Redis) + Message Queue (Kafka) + Message Store (Cassandra) + Push Notifications |
| 1-on-1 flow | Sender → Chat Server → Session lookup → forward to recipient's server → deliver via WS. Offline: push + store. |
| Group chat | Hybrid fan-out: store once, notify each member. Mention small-group vs. large-group trade-offs. |
| Presence | Heartbeat-based (every 30s), Redis cache, pub/sub for updates. Debounce for flapping. |
| Storage | Cassandra: partition by chat_id, cluster by message_id (Snowflake). TimeWindow compaction. |
| Ordering | Snowflake IDs + single Kafka partition per chat + per-chat sequence numbers. |
| Encryption | Signal Protocol (X3DH + Double Ratchet). Mention forward secrecy. Server sees only ciphertext. |
| Media | S3 presigned URL upload, send thumbnail + URL over WS. Don't send large files over WS. |
| Delivery | At-least-once via client retry + server dedup. Three-tick system for receipts. |
Common Follow-Up Questions
- "How do you handle message ordering across devices?" — Server-assigned Snowflake IDs, per-chat sequence numbers, client-side reorder buffer
- "What if a chat server crashes mid-delivery?" — Message is already in Kafka; client detects disconnect, reconnects, syncs from last known ID
- "How does E2E encryption work with multi-device?" — Each device has independent keys; sender encrypts N copies (one per device)
- "How do you handle a celebrity with 10M followers going online?" — Don't fan out presence to followers; only show presence to mutual contacts or active chat partners
- "What about message search?" — Client-side search (decrypt + local index) since server can't read E2E content. For non-E2E platforms: Elasticsearch index.
- "How do you handle typing indicators?" — Ephemeral WebSocket events, NOT persisted. Throttle to max 1 per 3 seconds per chat.