High Level Design Series · Real-World Designs· Part 49 of 70

Design: Chat System (WhatsApp)

April 2026 · 32 min read

The chat system — specifically a WhatsApp-scale messenger — is the most frequently asked system design interview question. It tests your understanding of real-time communication, message ordering, delivery guarantees, presence systems, and storage at scale. WhatsApp serves over 2 billion users with 100 billion messages per day. In this post, we design a system that handles 50 million daily active users (DAU) with sub-second message delivery.

What makes this problem fascinating is that it touches nearly every pillar of distributed systems: persistent connections (WebSockets), reliable message delivery (at-least-once semantics), global ordering (vector clocks), efficient storage (wide-column databases), end-to-end encryption (Signal Protocol), and real-time presence tracking (heartbeat + pub/sub). Let's build it from scratch.

Requirements

Functional Requirements

1-on-1 messaging — send and receive text messages between two users in real-time
Group chat — create groups of up to 500 members, broadcast messages to all members
Online/offline status — show whether a user is currently online or their "last seen" time
Read receipts — single tick (sent), double tick (delivered), blue tick (read)
Media sharing — images, videos, audio, and documents up to 100 MB
Message history — persistent storage, searchable chat history
Multi-device sync — messages available across phone, tablet, and desktop
Push notifications — notify offline users of new messages

Non-Functional Requirements

Low latency — messages delivered in < 500 ms for online recipients
High availability — 99.99% uptime (chat is mission-critical)
Consistency — eventual consistency acceptable, but messages must never be lost
Ordering — messages within a single chat must appear in correct order
Security — end-to-end encryption for all messages

Scale Estimates

Metric	Value
Daily Active Users (DAU)	50 million
Messages per user per day	~40
Total messages per day	2 billion
Messages per second (avg)	~23,000
Peak messages per second	~70,000 (3× average)
Concurrent WebSocket connections	~15 million (30% of DAU online at once)
Average message size	~200 bytes (text)
Storage per day (text only)	~400 GB

Connection math: Each WebSocket connection consumes ~10 KB of memory on the server. With 15 million concurrent connections, we need 15M × 10 KB = 150 GB of memory just for connection state. At 500K connections per server, that's 30 chat servers minimum — and we should plan for 2× headroom, so roughly 60 chat servers.

Communication Protocol

The choice of communication protocol is the single most important architectural decision for a chat system. Let's evaluate the options:

Option 1: HTTP Polling

The client periodically sends HTTP requests asking "any new messages?" This is the simplest approach but is incredibly wasteful:

If you poll every second, 15M clients generate 15 million requests/second — most with empty responses
Each HTTP request has ~200 bytes of headers overhead
Average latency: half the polling interval (e.g., 500 ms for 1-second polling)

Verdict: Unacceptable at scale. Wastes bandwidth and server resources.

Option 2: Long Polling

The client opens an HTTP connection and the server holds it open until a new message arrives (or a timeout, typically 30 seconds). Better than polling, but still problematic:

Each "held" connection ties up a server thread/socket
If a message arrives right after a timeout, the user waits for the next connection
Load balancers may route subsequent requests to a different server, losing context
The sender doesn't know which server holds the recipient's long-poll connection

Verdict: Better, but still not ideal for real-time bidirectional communication.

Option 3: Server-Sent Events (SSE)

A unidirectional server-to-client stream over HTTP. Good for pushing updates, but:

Only one-way (server → client); sending messages still requires separate HTTP POSTs
Limited to text data (no binary)
Max 6 connections per domain in HTTP/1.1 browsers

Verdict: Suitable for notifications, not for full-duplex chat.

Option 4: WebSocket ✓

WebSocket is the correct choice for a chat system. It provides:

Full-duplex — both client and server can send data simultaneously
Persistent connection — established once, reused for the session lifetime
Low overhead — after the initial HTTP handshake, frames have only 2–6 bytes of overhead
Real-time — sub-100 ms delivery once the connection is established
Binary support — can transmit both text and binary data

The WebSocket Handshake

A WebSocket connection starts with an HTTP Upgrade request:

-- Client sends HTTP Upgrade request --
GET /chat HTTP/1.1
Host: chat.example.com
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ==
Sec-WebSocket-Version: 13
Authorization: Bearer <jwt_token>

-- Server responds with 101 Switching Protocols --
HTTP/1.1 101 Switching Protocols
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Accept: s3pPLMBiTxaQ9kYGzzhZRbK+xOo=

-- Now both sides communicate via WebSocket frames --
-- Frame header: 2 bytes (opcode + length) + optional 4-byte mask --

After the handshake, the TCP connection is "upgraded" to a WebSocket connection. The HTTP server hands off the socket to the WebSocket handler. All subsequent communication uses the WebSocket protocol — lightweight binary frames, no HTTP headers.

Hybrid Approach

In practice, we use both protocols:

Operation	Protocol	Reason
Send/receive messages	WebSocket	Real-time, bidirectional
Typing indicators	WebSocket	Ephemeral, real-time
Read receipts	WebSocket	Low-latency status update
User registration / login	HTTPS	Stateless, request-response
Profile updates	HTTPS	Infrequent, not real-time
Media upload	HTTPS (+ S3 presigned)	Large files, multipart upload
Group management	HTTPS	CRUD operations

High-Level Architecture

The system is composed of several specialized services, each handling a distinct concern:

Core Components

API Gateway / Load Balancer — terminates TLS, authenticates users, routes WebSocket connections to chat servers using consistent hashing (by user_id) for sticky sessions
Chat Servers (WebSocket Servers) — maintain persistent WebSocket connections with clients, route messages between users, handle message acknowledgments
Presence Servers — track online/offline status using heartbeats, maintain a presence cache (Redis), publish status changes to subscribers
Message Queue (Apache Kafka) — decouple message production from consumption, buffer messages during traffic spikes, ensure reliable delivery with at-least-once semantics
Message Store (Apache Cassandra) — persistent storage for all chat messages, optimized for write-heavy workloads with time-series data patterns
Media Storage (Amazon S3 + CDN) — store images, videos, audio, documents; generate presigned upload/download URLs
Push Notification Service — send push notifications (APNs, FCM) to offline users, integrate with OS-level notification systems
User Service — user profiles, contacts, block lists, settings (backed by MySQL/PostgreSQL)
Group Service — group metadata, membership management, admin roles
Session Service (Redis) — maps user_id → (chat_server_id, websocket_connection_id) so any chat server can look up where a recipient is connected

┌─────────────┐     ┌─────────────┐     ┌─────────────┐
│   Client A  │     │   Client B  │     │   Client C  │
│  (Mobile)   │     │  (Desktop)  │     │  (Mobile)   │
└──────┬──────┘     └──────┬──────┘     └──────┬──────┘
       │ WSS              │ WSS              │ WSS
       ▼                  ▼                  ▼
┌─────────────────────────────────────────────────────┐
│              Load Balancer (L4/L7)                   │
│        (sticky sessions via user_id hash)            │
└───────┬──────────────┬──────────────┬───────────────┘
        ▼              ▼              ▼
  ┌───────────┐  ┌───────────┐  ┌───────────┐
  │Chat Server│  │Chat Server│  │Chat Server│
  │    #1     │  │    #2     │  │    #3     │
  └─────┬─────┘  └─────┬─────┘  └─────┬─────┘
        │              │              │
        ▼              ▼              ▼
  ┌─────────────────────────────────────────┐
  │         Session Store (Redis)            │
  │    user_id → {server_id, conn_id}        │
  └─────────────────┬───────────────────────┘
                    │
        ┌───────────┼───────────┐
        ▼           ▼           ▼
  ┌───────────┐ ┌────────┐ ┌───────────┐
  │  Message   │ │Presence│ │   Push     │
  │  Queue     │ │Service │ │Notification│
  │  (Kafka)   │ │(Redis) │ │  Service   │
  └─────┬─────┘ └────────┘ └───────────┘
        ▼
  ┌───────────┐  ┌───────────┐
  │  Message   │  │  Media     │
  │  Store     │  │  Storage   │
  │(Cassandra) │  │  (S3+CDN)  │
  └───────────┘  └───────────┘

1-on-1 Messaging Flow

This is the core of the system. Let's trace the complete lifecycle of a single message from User A to User B:

Step-by-Step Flow (User B Online)

User A types a message and taps send. The client generates a client_message_id (UUID) for idempotency.

Client A sends via WebSocket to the chat server it's connected to (Chat Server #1). The frame contains:

{
  "type": "message",
  "client_msg_id": "550e8400-e29b-41d4-a716-446655440000",
  "from": "user_A",
  "to": "user_B",
  "chat_id": "chat_AB",
  "content": "Hey, are you free tonight?",
  "content_type": "text",
  "timestamp": 1714000000000
}

Chat Server #1 receives the message, assigns a server-side message_id (monotonically increasing per chat, e.g., Snowflake ID), and immediately sends an ACK back to User A (single ✓):
```
{
  "type": "ack",
  "client_msg_id": "550e8400-...",
  "server_msg_id": "msg_174200001",
  "status": "sent",
  "timestamp": 1714000000050
}
```
Chat Server #1 persists the message to Kafka (topic: messages, partition key: chat_AB) for durable storage. A Kafka consumer writes it to Cassandra asynchronously.
Chat Server #1 looks up User B's connection in the Session Store (Redis): GET session:user_B → {"server": "chat_server_2", "conn": "ws_889"}
Chat Server #1 forwards the message to Chat Server #2 via an internal service mesh (gRPC or direct TCP). Chat Server #2 finds the WebSocket connection for User B and delivers the message.
User B's client receives the message and sends a delivery ACK back through Chat Server #2. This triggers a "delivered" status (double ✓✓) that propagates back to User A.
User B opens the chat and reads the message. The client sends a read receipt. User A now sees a blue ✓✓.

Step-by-Step Flow (User B Offline)

Steps 1–4 are identical: User A sends, Chat Server acknowledges and persists.
Session Store lookup returns NULL — User B is not connected to any chat server.
Chat Server publishes to the push notification service, which sends a push notification via APNs (iOS) or FCM (Android).
The message remains in Cassandra, awaiting sync when User B comes online.
User B opens the app → establishes a WebSocket connection → sends a sync request with the last known server_msg_id for each chat → Chat Server queries Cassandra for all messages with IDs greater than the last known → delivers them in bulk.

▶ 1-on-1 Message Flow

Watch a message travel from User A → Chat Server → User B (online), then see the offline flow with push notification and sync.

Message Deduplication

Networks are unreliable. A client may retry a send if it doesn't receive an ACK. The server uses the client_message_id (UUID) for idempotency:

-- On the chat server (pseudocode):
def handle_message(msg):
    # Check if we already processed this client_msg_id
    if redis.sismember(f"dedup:{msg.chat_id}", msg.client_msg_id):
        # Already processed — resend the ACK but don't duplicate the message
        return send_ack(msg.client_msg_id, existing_server_msg_id)
    
    server_msg_id = snowflake.next_id()
    kafka.produce("messages", key=msg.chat_id, value=msg)
    redis.sadd(f"dedup:{msg.chat_id}", msg.client_msg_id)
    redis.expire(f"dedup:{msg.chat_id}", 86400)  # 24h TTL
    
    return send_ack(msg.client_msg_id, server_msg_id)

Group Chat Design

Group messaging introduces a fan-out problem: one message must be delivered to potentially hundreds of recipients. There are two strategies:

Strategy 1: Fan-out on Write (Write-Heavy)

When a message is sent to a group, the server immediately copies it into every member's personal inbox (message queue). Each member's chat server then delivers from their inbox.

Pro: Reads are fast — each user just reads from their own inbox
Con: One message to a 500-member group creates 500 copies — write amplification
Used by: WeChat (for small groups)

Strategy 2: Fan-out on Read (Read-Heavy)

The message is written once to the group's message store. When a member opens the group chat, they query the group's timeline.

Pro: Only one write per message regardless of group size
Con: Reads require joining group timeline with user's "last read" pointer
Used by: Discord (for large servers)

Our Hybrid Approach

We use a hybrid fan-out — the message is stored once in the group's Cassandra table, but we also push a lightweight notification entry into each member's message queue (via Kafka) so their chat server knows to deliver in real-time:

-- Group message flow:
1. User A sends message to group_xyz
2. Chat Server writes message to Kafka topic "group_messages"
   (partition key: group_xyz)
3. Kafka consumer writes to Cassandra: messages_by_group table
4. For each member in group_xyz (fetched from Group Service):
   a. Look up member's session in Redis
   b. If online → forward message via their chat server's WebSocket
   c. If offline → enqueue push notification
5. Each receiving client sends delivery ACK per message

▶ Group Chat Fan-out

User A sends a message to a group of 5. Watch it fan out to each member's message queue and get delivered via WebSocket.

Group Message Ordering

Within a group, all messages are produced to the same Kafka partition (partitioned by group_id), which guarantees a total order. The Kafka consumer assigns monotonically increasing sequence numbers as it writes to Cassandra. Clients use these sequence numbers to display messages in the correct order.

Hot partition problem: A very active group (e.g., a 500-member school group) can become a hot partition in Kafka. Mitigation: use a dedicated Kafka topic for "hot" groups, or sub-partition messages within a group by time buckets.

Online Presence System

The presence system answers: "Is User X online right now?" This seems simple but is surprisingly tricky at scale — especially when a user has millions of contacts who all want real-time status updates.

Heartbeat Mechanism

Each connected client sends a heartbeat to the presence server every 30 seconds. If no heartbeat is received for 90 seconds (3 missed heartbeats), the user is marked offline:

-- Heartbeat handling (Presence Server):
def handle_heartbeat(user_id):
    # Update last_active timestamp in Redis
    redis.hset(f"presence:{user_id}", mapping={
        "status": "online",
        "last_active": current_timestamp(),
        "server_id": this_server_id
    })
    # Set a TTL — if no heartbeat within 90s, key expires → offline
    redis.expire(f"presence:{user_id}", 90)

def on_key_expired(user_id):
    # Redis keyspace notification: key expired → user went offline
    publish_status_change(user_id, "offline", last_active_time)

def on_websocket_close(user_id):
    # Immediate offline detection when connection drops cleanly
    redis.hset(f"presence:{user_id}", "status", "offline")
    redis.hset(f"presence:{user_id}", "last_active", current_timestamp())
    publish_status_change(user_id, "offline", current_timestamp())

Status Fan-out via Pub/Sub

When User A's status changes, all friends who have User A's chat open need to be notified. Naive approach: look up all of User A's contacts and push to each — this is O(friends) per status change and extremely expensive for popular users.

Better approach: subscription-based pub/sub:

When User B opens a chat with User A, User B subscribes to User A's presence channel: SUBSCRIBE presence:user_A
When User A's status changes, the presence server publishes: PUBLISH presence:user_A {status: "online"}
Only actively interested users receive the update — no wasted fan-out
When User B closes the chat or goes offline, the subscription is removed

For group chats, when User B opens a group, they subscribe to presence channels of all group members. The presence server uses Redis Pub/Sub for this.

▶ Presence System: Heartbeat & Status Updates

Watch a user come online, heartbeats maintain status, and friends subscribed to the channel receive real-time updates.

Handling Edge Cases

Flapping — a user with a bad connection rapidly toggles online/offline. Solution: add a debounce window (e.g., don't publish "offline" until 30 seconds after the last heartbeat miss). Only publish "online" immediately.
Multi-device — User has phone and desktop. "Online" if any device is connected; "offline" only when all devices disconnect. Maintain a per-device presence set: SADD presence_devices:user_A "phone" "desktop"
Large groups — subscribing to 500 presence channels when opening a large group is expensive. Solution: only subscribe to presence for visible members (viewport-based), and batch presence queries for the rest: MGET presence:user_1 presence:user_2 ...

Message Storage (Cassandra)

Why Cassandra for chat messages? It's the perfect fit:

Write-optimized — uses an LSM-tree (Log-Structured Merge-tree), which converts random writes into sequential writes. Chat is extremely write-heavy.
Time-series friendly — messages are naturally ordered by time, and Cassandra excels at range queries within a partition.
Linearly scalable — add nodes to handle more data/traffic, no single point of failure.
Tunable consistency — use QUORUM writes for durability, LOCAL_ONE reads for speed.

Schema Design

-- 1-on-1 messages: partitioned by chat_id, clustered by message_id (time-ordered)
CREATE TABLE messages_by_chat (
    chat_id       TEXT,          -- deterministic: sorted(user_A, user_B)
    message_id    BIGINT,        -- Snowflake ID (encodes timestamp + server + seq)
    sender_id     TEXT,
    content       BLOB,          -- encrypted message body
    content_type  TEXT,          -- 'text', 'image', 'video', 'audio', 'document'
    media_url     TEXT,          -- S3 URL for media messages
    thumbnail     BLOB,          -- base64 thumbnail for images/videos
    status        TEXT,          -- 'sent', 'delivered', 'read'
    created_at    TIMESTAMP,
    PRIMARY KEY (chat_id, message_id)
) WITH CLUSTERING ORDER BY (message_id DESC)
  AND compaction = {'class': 'TimeWindowCompactionStrategy',
                    'compaction_window_unit': 'DAYS',
                    'compaction_window_size': 7}
  AND gc_grace_seconds = 864000
  AND default_time_to_live = 0;

-- Group messages: partitioned by group_id, clustered by message_id
CREATE TABLE messages_by_group (
    group_id      TEXT,
    message_id    BIGINT,        -- Snowflake ID
    sender_id     TEXT,
    content       BLOB,
    content_type  TEXT,
    media_url     TEXT,
    thumbnail     BLOB,
    created_at    TIMESTAMP,
    PRIMARY KEY (group_id, message_id)
) WITH CLUSTERING ORDER BY (message_id DESC)
  AND compaction = {'class': 'TimeWindowCompactionStrategy',
                    'compaction_window_unit': 'DAYS',
                    'compaction_window_size': 7};

-- Per-user chat index: which chats does a user belong to, ordered by last activity
CREATE TABLE chats_by_user (
    user_id           TEXT,
    last_message_at   TIMESTAMP,
    chat_id           TEXT,
    chat_type         TEXT,       -- 'dm' or 'group'
    other_user_id     TEXT,       -- NULL for group chats
    group_name        TEXT,       -- NULL for DMs
    last_message_preview TEXT,    -- first 100 chars of last message
    unread_count      INT,
    PRIMARY KEY (user_id, last_message_at, chat_id)
) WITH CLUSTERING ORDER BY (last_message_at DESC, chat_id ASC);

-- Read receipts: track per-user read position in each chat
CREATE TABLE read_receipts (
    chat_id       TEXT,
    user_id       TEXT,
    last_read_msg_id  BIGINT,    -- the highest message_id this user has read
    read_at       TIMESTAMP,
    PRIMARY KEY (chat_id, user_id)
);

Why chat_id as partition key? All messages in a conversation live in the same Cassandra partition. This means fetching a page of chat history is a single-partition range query — extremely fast, O(log N) seek + sequential read. The trade-off: a very active chat could create a large partition. Mitigation: add a bucket column (e.g., monthly buckets) to split large chats across partitions.

Query Patterns

-- Fetch latest 50 messages in a chat (pagination)
SELECT * FROM messages_by_chat
WHERE chat_id = 'chat_AB'
ORDER BY message_id DESC
LIMIT 50;

-- Fetch messages after a sync point (for offline → online sync)
SELECT * FROM messages_by_chat
WHERE chat_id = 'chat_AB'
AND message_id > 174200001     -- last known message_id
ORDER BY message_id ASC;

-- Fetch user's chat list (inbox)
SELECT * FROM chats_by_user
WHERE user_id = 'user_A'
LIMIT 20;

-- Get unread count across all chats
SELECT chat_id, unread_count FROM chats_by_user
WHERE user_id = 'user_A' AND unread_count > 0;

Partition Sizing

Let's estimate partition sizes:

Average message size (encrypted): ~500 bytes
Active 1-on-1 chat: ~50 messages/day × 365 days = ~18,250 messages/year
Partition size per year: 18,250 × 500 bytes ≈ 9 MB — well within Cassandra's recommended 100 MB per partition
Very active group (100 messages/day): ~36,500 messages/year ≈ 18 MB — still fine
Extremely active group (1,000 messages/day): ~180 MB/year — needs time-bucketed partitions

Message Ordering & Delivery Guarantees

Correct message ordering is critical for a chat application. Imagine seeing "Yes!" before "Do you want to grab dinner?" — the conversation becomes nonsensical.

Ordering Strategies

1. Client Timestamp (Unreliable)

Using the sender's device clock is tempting but fundamentally broken:

Device clocks can be wrong (manual changes, timezone bugs, NTP drift)
Two users' clocks may differ by seconds or minutes
A message sent "later" could have an earlier timestamp

2. Server Timestamp (Better)

The chat server assigns a timestamp when it receives the message. Problems:

Multiple chat servers have slightly different clocks (even with NTP, drift can be ~1 ms)
If two messages arrive at different servers within the same millisecond, ordering is ambiguous

3. Snowflake ID (Our Choice) ✓

We use Twitter Snowflake-style IDs: 64-bit integers that embed a timestamp and guarantee global uniqueness and roughly chronological ordering:

┌──────────────────────────────────────────────────────────────────┐
│                     64-bit Snowflake ID                          │
├──────────┬────────────┬──────────────┬──────────────────────────┤
│  1 bit   │  41 bits   │   10 bits    │       12 bits            │
│  (sign)  │ timestamp  │  machine ID  │   sequence number        │
│          │ (ms since  │  (1024       │   (4096 IDs per ms       │
│          │  epoch)    │   servers)   │    per server)            │
└──────────┴────────────┴──────────────┴──────────────────────────┘

-- 41 bits of timestamp: ~69 years from custom epoch
-- 10 bits of machine ID: supports 1024 chat servers
-- 12 bits of sequence: 4096 messages per millisecond per server

For within a single chat, we add an extra guarantee: all messages for a chat flow through the same Kafka partition (partitioned by chat_id), and the Kafka consumer assigns a per-chat sequence number. This gives us a total order within each conversation.

Delivery Guarantees

Guarantee	How We Achieve It
At-least-once delivery	Client retries on ACK timeout + server deduplicates via `client_msg_id`
No message loss	Kafka persists messages to disk before ACKing (`acks=all`); Cassandra replicates to 3 nodes
Per-chat ordering	Single Kafka partition per chat + per-chat sequence number in Cassandra
Exactly-once semantics (effective)	Client dedup on `server_msg_id` + server dedup on `client_msg_id`

Handling Out-of-Order Delivery

Even with all guarantees, messages can arrive out of order at the client (e.g., network jitter, reconnection). The client maintains a local buffer:

// Client-side ordering logic:
class MessageBuffer {
    constructor(chatId) {
        this.chatId = chatId;
        this.expectedSeq = lastKnownSeq + 1;
        this.buffer = new Map(); // seq → message
    }

    onMessageReceived(msg) {
        if (msg.seq === this.expectedSeq) {
            // In order — display immediately
            this.display(msg);
            this.expectedSeq++;
            // Flush any buffered messages that are now in order
            while (this.buffer.has(this.expectedSeq)) {
                this.display(this.buffer.get(this.expectedSeq));
                this.buffer.delete(this.expectedSeq);
                this.expectedSeq++;
            }
        } else if (msg.seq > this.expectedSeq) {
            // Out of order — buffer it, wait for missing messages
            this.buffer.set(msg.seq, msg);
            // If gap persists for > 2 seconds, request missing messages
            this.scheduleGapFill(this.expectedSeq, msg.seq - 1);
        }
        // msg.seq < expectedSeq → duplicate, ignore
    }
}

End-to-End Encryption

WhatsApp uses the Signal Protocol (developed by Open Whisper Systems) for end-to-end encryption. The server never sees plaintext messages — it only relays encrypted blobs.

Key Concepts

Identity Key Pair — a long-term Curve25519 key pair generated on first install. The public key is uploaded to the server as the user's identity.
Signed Pre-Key — a medium-term key pair, signed by the identity key. Rotated periodically (e.g., weekly).
One-Time Pre-Keys — a batch of ephemeral key pairs uploaded to the server. Each is used only once for initial key exchange, then deleted.
Session Key (Ratchet) — a symmetric key derived from the Diffie-Hellman exchange. Uses the Double Ratchet Algorithm to generate a new key for every single message — providing forward secrecy.

Initial Key Exchange (X3DH)

-- User A wants to message User B for the first time:

1. User A fetches User B's "key bundle" from the server:
   - Identity Key (IKb)
   - Signed Pre-Key (SPKb) + signature
   - One-Time Pre-Key (OPKb)

2. User A verifies SPKb's signature using IKb

3. User A performs X3DH (Extended Triple Diffie-Hellman):
   DH1 = DH(IKa, SPKb)    -- A's identity × B's signed pre-key
   DH2 = DH(EKa, IKb)     -- A's ephemeral × B's identity
   DH3 = DH(EKa, SPKb)    -- A's ephemeral × B's signed pre-key
   DH4 = DH(EKa, OPKb)    -- A's ephemeral × B's one-time pre-key

   Master Secret = KDF(DH1 || DH2 || DH3 || DH4)

4. Derive initial chain keys from Master Secret
5. Encrypt first message with derived key
6. Send: encrypted message + IKa (public) + EKa (public) + used OPKb ID

-- User B receives and performs the same DH calculations to derive
-- the same Master Secret, then decrypts the message.

Double Ratchet Algorithm

After the initial exchange, every message uses a new encryption key derived via two "ratchets":

Symmetric ratchet (KDF chain) — each message advances a hash chain: key_n+1 = HMAC(key_n, constant). Even if one key is compromised, past keys cannot be derived (forward secrecy).
Diffie-Hellman ratchet — periodically (on each message exchange), parties generate new DH key pairs and mix fresh DH output into the chain. This provides future secrecy (aka break-in recovery) — if a key is compromised, future keys become secure again after the next ratchet step.

What the server sees: Only encrypted blobs, sender/recipient IDs, timestamps, and message sizes. The server cannot read message content, even under a court order. This is a fundamental design choice — and a frequent interview discussion point.

Group Encryption

For group chats, WhatsApp uses the Sender Keys protocol:

Each group member generates a Sender Key for the group
The Sender Key is distributed to all group members via pairwise encrypted channels (using the 1-on-1 encryption above)
Messages to the group are encrypted once with the sender's Sender Key (symmetric encryption)
All members can decrypt using the sender's Sender Key — O(1) encryption per message instead of O(N)
When a member leaves, all Sender Keys are regenerated and redistributed

Media Handling

Sending a photo, video, or document follows a different flow than text messages:

Upload Flow

Client encrypts the media locally using a random AES-256 key

Client requests a presigned S3 upload URL from the API server (via HTTPS):

POST /api/v1/media/upload-url
{
  "content_type": "image/jpeg",
  "file_size": 2456789,
  "chat_id": "chat_AB"
}

Response:
{
  "upload_url": "https://s3.amazonaws.com/chat-media/...",
  "media_id": "media_8a7f3c",
  "expires_in": 3600
}

Client uploads encrypted media directly to S3 (bypasses chat servers — no unnecessary load)
Client generates a thumbnail (e.g., 100×100 JPEG for images, first-frame for videos)

Client sends a message via WebSocket containing the media_id, encryption key (encrypted with the chat's session key), thumbnail, and metadata:

{
  "type": "message",
  "content_type": "image",
  "media_id": "media_8a7f3c",
  "media_url": "https://cdn.example.com/media/media_8a7f3c",
  "encryption_key": "<base64_aes_key_encrypted_with_session_key>",
  "thumbnail": "<base64_blurred_thumbnail>",
  "file_size": 2456789,
  "dimensions": {"width": 1920, "height": 1080},
  "caption": "Look at this sunset!"
}

Recipient receives the message, displays the thumbnail immediately, downloads the full media from S3/CDN in the background, decrypts with the AES key

Why not send media over WebSocket? WebSocket connections are precious (each server holds 500K connections). Sending a 50 MB video over a WebSocket would block that connection for seconds, delay other messages, and risk timeout. S3 presigned URLs offload the heavy lifting to dedicated storage infrastructure with built-in CDN, multipart upload, and retry support.

Media Processing Pipeline

After upload, a background pipeline processes the media:

Virus scanning — ClamAV or similar scans the encrypted media (if scanning before encryption) or validates file headers
Content moderation — for profile photos and group icons (not E2E encrypted content)
Transcoding — generate multiple resolutions for video (360p, 720p, 1080p)
CDN distribution — replicate to edge locations for fast download globally
Expiration — media not accessed for 30 days is moved to Glacier/cold storage

Read Receipts & Delivery Status

The three-tick system is a core UX feature of chat apps:

Status	Visual	Trigger
Sent	✓ (single gray tick)	Server ACKs the message (persisted to Kafka)
Delivered	✓✓ (double gray tick)	Recipient's device receives the message and sends delivery ACK
Read	✓✓ (double blue tick)	Recipient opens the chat (scrolls message into viewport)

Implementation

-- Delivery receipt (sent by recipient's device automatically):
{
  "type": "receipt",
  "receipt_type": "delivered",
  "chat_id": "chat_AB",
  "message_ids": [174200001, 174200002, 174200003],
  "user_id": "user_B",
  "timestamp": 1714000005000
}

-- Read receipt (sent when user opens the chat):
{
  "type": "receipt",
  "receipt_type": "read",
  "chat_id": "chat_AB",
  "up_to_msg_id": 174200003,    -- "I've read everything up to this ID"
  "user_id": "user_B",
  "timestamp": 1714000010000
}

Note the optimization: read receipts use up_to_msg_id instead of listing every message ID. This means "I've read all messages up to and including this ID" — one receipt covers any number of messages.

Group Read Receipts

In groups, tracking read status is more complex. Each member's read position is stored independently:

-- Query: "Who has read message X in group_xyz?"
SELECT user_id, read_at FROM read_receipts
WHERE chat_id = 'group_xyz'
AND last_read_msg_id >= 174200050;

-- This is an expensive query for large groups, so:
-- 1. Cache frequently-accessed read receipt data in Redis
-- 2. Only compute detailed "seen by" lists on user request (tap on message)
-- 3. Show a simple count ("Read by 45 of 100") by default

Multi-Device Synchronization

Users expect to access their messages on multiple devices — phone, desktop app, and web. This creates a sync challenge: how do you keep all devices in sync without duplicating every message N times?

Sync Protocol

Each device maintains a sync cursor — the last_server_msg_id it has received for each chat

On connect/reconnect, the device sends its sync cursors to the server:

{
  "type": "sync_request",
  "device_id": "phone_001",
  "cursors": {
    "chat_AB": 174200001,
    "chat_AC": 174100050,
    "group_xyz": 174150020
  }
}

The server computes deltas for each chat (messages with server_msg_id > cursor) and sends them in batches
Real-time messages are delivered to all connected devices simultaneously — the Session Store maps user_id → [{server, conn}, {server, conn}, ...]

Primary vs. Companion Devices

WhatsApp's original model uses a primary device (phone) with companion devices (desktop, web) that mirror through the phone. The newer multi-device architecture (WhatsApp Multi-Device) is fully independent:

Each device has its own identity key pair and encrypts/decrypts independently
Sending a message to User B actually sends N copies (one per device of User B), each encrypted with that device's session key
Trade-off: more encryption overhead, but no dependency on a single primary device

Push Notifications

When a recipient is offline, we must deliver a push notification. The push notification service:

Receives events from chat servers (via Kafka topic push_notifications)
Looks up the user's device tokens (stored in a device registry)
Formats the notification payload (sender name, message preview — if not E2E encrypted, or just "New message from X")
Sends via APNs (Apple Push Notification service) for iOS or FCM (Firebase Cloud Messaging) for Android
Handles token invalidation (device unregistered), rate limiting (don't spam), and notification collapsing (group multiple messages into one notification)

-- Push notification payload (FCM example):
{
  "to": "<device_fcm_token>",
  "notification": {
    "title": "User A",
    "body": "New message"      // Can't show content due to E2E encryption
  },
  "data": {
    "chat_id": "chat_AB",
    "sender_id": "user_A",
    "msg_count": 3,
    "type": "new_message"
  },
  "android": {
    "priority": "high"         // Wake device from Doze mode
  }
}

E2E encryption vs. notification preview: Since the server can't decrypt messages, push notifications can only show "New message from User A" — not the actual content. To show content in notifications, some apps (like Signal) include the encrypted message payload in the push notification and decrypt it on the device before displaying.

Scalability & Fault Tolerance

Chat Server Scaling

Horizontal scaling — add more chat servers as connections grow. Each server handles ~500K concurrent WebSocket connections.
Sticky sessions — the load balancer routes each user to the same chat server (consistent hashing on user_id). On server failure, connections are re-established and the LB routes to a different server.
Graceful shutdown — before taking a server down for maintenance, drain connections: stop accepting new ones, send a "reconnect" frame to existing clients, wait for all to disconnect.

Kafka Scaling

Partitioning — messages partitioned by chat_id (1-on-1) or group_id (group). Start with 256 partitions, increase as needed.
Replication factor 3 — each partition replicated to 3 brokers for durability
Consumer groups — multiple consumers in a group for parallel processing of Cassandra writes

Cassandra Scaling

Replication factor 3 with NetworkTopologyStrategy across multiple data centers
Write consistency: QUORUM (2 of 3 nodes acknowledge) — strong durability
Read consistency: LOCAL_ONE — fast reads from the nearest replica
Compaction: TimeWindowCompactionStrategy — optimal for time-series data (chat messages)

Handling Chat Server Failures

Scenario: Chat Server #2 crashes while holding 500K connections

1. All 500K WebSocket connections drop immediately
2. Clients detect disconnect (TCP keepalive or missing heartbeat)
3. Clients reconnect with exponential backoff (1s, 2s, 4s, 8s...)
4. Load balancer routes to healthy servers (Chat Server #1, #3, ...)
5. New chat server registers the connection in Session Store (Redis)
6. Clients send sync requests to fetch missed messages
7. No messages are lost — they're all in Kafka/Cassandra

Recovery time: ~5-10 seconds for reconnection + sync

Interview Cheat Sheet

When asked "Design WhatsApp / a chat system" in an interview, here's the framework:

Topic	Key Points
Protocol	WebSocket for real-time, HTTP for non-real-time. Explain the handshake.
Architecture	Chat servers (WS) + Session Store (Redis) + Message Queue (Kafka) + Message Store (Cassandra) + Push Notifications
1-on-1 flow	Sender → Chat Server → Session lookup → forward to recipient's server → deliver via WS. Offline: push + store.
Group chat	Hybrid fan-out: store once, notify each member. Mention small-group vs. large-group trade-offs.
Presence	Heartbeat-based (every 30s), Redis cache, pub/sub for updates. Debounce for flapping.
Storage	Cassandra: partition by chat_id, cluster by message_id (Snowflake). TimeWindow compaction.
Ordering	Snowflake IDs + single Kafka partition per chat + per-chat sequence numbers.
Encryption	Signal Protocol (X3DH + Double Ratchet). Mention forward secrecy. Server sees only ciphertext.
Media	S3 presigned URL upload, send thumbnail + URL over WS. Don't send large files over WS.
Delivery	At-least-once via client retry + server dedup. Three-tick system for receipts.

Common Follow-Up Questions

"How do you handle message ordering across devices?" — Server-assigned Snowflake IDs, per-chat sequence numbers, client-side reorder buffer
"What if a chat server crashes mid-delivery?" — Message is already in Kafka; client detects disconnect, reconnects, syncs from last known ID
"How does E2E encryption work with multi-device?" — Each device has independent keys; sender encrypts N copies (one per device)
"How do you handle a celebrity with 10M followers going online?" — Don't fan out presence to followers; only show presence to mutual contacts or active chat partners
"What about message search?" — Client-side search (decrypt + local index) since server can't read E2E content. For non-E2E platforms: Elasticsearch index.
"How do you handle typing indicators?" — Ephemeral WebSocket events, NOT persisted. Throttle to max 1 per 3 seconds per chat.