← All Posts
High Level Design Series · Real-World Designs· Part 49 of 70

Design: Chat System (WhatsApp)

The chat system — specifically a WhatsApp-scale messenger — is the most frequently asked system design interview question. It tests your understanding of real-time communication, message ordering, delivery guarantees, presence systems, and storage at scale. WhatsApp serves over 2 billion users with 100 billion messages per day. In this post, we design a system that handles 50 million daily active users (DAU) with sub-second message delivery.

What makes this problem fascinating is that it touches nearly every pillar of distributed systems: persistent connections (WebSockets), reliable message delivery (at-least-once semantics), global ordering (vector clocks), efficient storage (wide-column databases), end-to-end encryption (Signal Protocol), and real-time presence tracking (heartbeat + pub/sub). Let's build it from scratch.

Requirements

Functional Requirements

  1. 1-on-1 messaging — send and receive text messages between two users in real-time
  2. Group chat — create groups of up to 500 members, broadcast messages to all members
  3. Online/offline status — show whether a user is currently online or their "last seen" time
  4. Read receipts — single tick (sent), double tick (delivered), blue tick (read)
  5. Media sharing — images, videos, audio, and documents up to 100 MB
  6. Message history — persistent storage, searchable chat history
  7. Multi-device sync — messages available across phone, tablet, and desktop
  8. Push notifications — notify offline users of new messages

Non-Functional Requirements

Scale Estimates

MetricValue
Daily Active Users (DAU)50 million
Messages per user per day~40
Total messages per day2 billion
Messages per second (avg)~23,000
Peak messages per second~70,000 (3× average)
Concurrent WebSocket connections~15 million (30% of DAU online at once)
Average message size~200 bytes (text)
Storage per day (text only)~400 GB
Connection math: Each WebSocket connection consumes ~10 KB of memory on the server. With 15 million concurrent connections, we need 15M × 10 KB = 150 GB of memory just for connection state. At 500K connections per server, that's 30 chat servers minimum — and we should plan for 2× headroom, so roughly 60 chat servers.

Communication Protocol

The choice of communication protocol is the single most important architectural decision for a chat system. Let's evaluate the options:

Option 1: HTTP Polling

The client periodically sends HTTP requests asking "any new messages?" This is the simplest approach but is incredibly wasteful:

Verdict: Unacceptable at scale. Wastes bandwidth and server resources.

Option 2: Long Polling

The client opens an HTTP connection and the server holds it open until a new message arrives (or a timeout, typically 30 seconds). Better than polling, but still problematic:

Verdict: Better, but still not ideal for real-time bidirectional communication.

Option 3: Server-Sent Events (SSE)

A unidirectional server-to-client stream over HTTP. Good for pushing updates, but:

Verdict: Suitable for notifications, not for full-duplex chat.

Option 4: WebSocket ✓

WebSocket is the correct choice for a chat system. It provides:

The WebSocket Handshake

A WebSocket connection starts with an HTTP Upgrade request:

-- Client sends HTTP Upgrade request --
GET /chat HTTP/1.1
Host: chat.example.com
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ==
Sec-WebSocket-Version: 13
Authorization: Bearer <jwt_token>

-- Server responds with 101 Switching Protocols --
HTTP/1.1 101 Switching Protocols
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Accept: s3pPLMBiTxaQ9kYGzzhZRbK+xOo=

-- Now both sides communicate via WebSocket frames --
-- Frame header: 2 bytes (opcode + length) + optional 4-byte mask --
After the handshake, the TCP connection is "upgraded" to a WebSocket connection. The HTTP server hands off the socket to the WebSocket handler. All subsequent communication uses the WebSocket protocol — lightweight binary frames, no HTTP headers.

Hybrid Approach

In practice, we use both protocols:

OperationProtocolReason
Send/receive messagesWebSocketReal-time, bidirectional
Typing indicatorsWebSocketEphemeral, real-time
Read receiptsWebSocketLow-latency status update
User registration / loginHTTPSStateless, request-response
Profile updatesHTTPSInfrequent, not real-time
Media uploadHTTPS (+ S3 presigned)Large files, multipart upload
Group managementHTTPSCRUD operations

High-Level Architecture

The system is composed of several specialized services, each handling a distinct concern:

Core Components

  1. API Gateway / Load Balancer — terminates TLS, authenticates users, routes WebSocket connections to chat servers using consistent hashing (by user_id) for sticky sessions
  2. Chat Servers (WebSocket Servers) — maintain persistent WebSocket connections with clients, route messages between users, handle message acknowledgments
  3. Presence Servers — track online/offline status using heartbeats, maintain a presence cache (Redis), publish status changes to subscribers
  4. Message Queue (Apache Kafka) — decouple message production from consumption, buffer messages during traffic spikes, ensure reliable delivery with at-least-once semantics
  5. Message Store (Apache Cassandra) — persistent storage for all chat messages, optimized for write-heavy workloads with time-series data patterns
  6. Media Storage (Amazon S3 + CDN) — store images, videos, audio, documents; generate presigned upload/download URLs
  7. Push Notification Service — send push notifications (APNs, FCM) to offline users, integrate with OS-level notification systems
  8. User Service — user profiles, contacts, block lists, settings (backed by MySQL/PostgreSQL)
  9. Group Service — group metadata, membership management, admin roles
  10. Session Service (Redis) — maps user_id → (chat_server_id, websocket_connection_id) so any chat server can look up where a recipient is connected
┌─────────────┐     ┌─────────────┐     ┌─────────────┐
│   Client A  │     │   Client B  │     │   Client C  │
│  (Mobile)   │     │  (Desktop)  │     │  (Mobile)   │
└──────┬──────┘     └──────┬──────┘     └──────┬──────┘
       │ WSS              │ WSS              │ WSS
       ▼                  ▼                  ▼
┌─────────────────────────────────────────────────────┐
│              Load Balancer (L4/L7)                   │
│        (sticky sessions via user_id hash)            │
└───────┬──────────────┬──────────────┬───────────────┘
        ▼              ▼              ▼
  ┌───────────┐  ┌───────────┐  ┌───────────┐
  │Chat Server│  │Chat Server│  │Chat Server│
  │    #1     │  │    #2     │  │    #3     │
  └─────┬─────┘  └─────┬─────┘  └─────┬─────┘
        │              │              │
        ▼              ▼              ▼
  ┌─────────────────────────────────────────┐
  │         Session Store (Redis)            │
  │    user_id → {server_id, conn_id}        │
  └─────────────────┬───────────────────────┘
                    │
        ┌───────────┼───────────┐
        ▼           ▼           ▼
  ┌───────────┐ ┌────────┐ ┌───────────┐
  │  Message   │ │Presence│ │   Push     │
  │  Queue     │ │Service │ │Notification│
  │  (Kafka)   │ │(Redis) │ │  Service   │
  └─────┬─────┘ └────────┘ └───────────┘
        ▼
  ┌───────────┐  ┌───────────┐
  │  Message   │  │  Media     │
  │  Store     │  │  Storage   │
  │(Cassandra) │  │  (S3+CDN)  │
  └───────────┘  └───────────┘

1-on-1 Messaging Flow

This is the core of the system. Let's trace the complete lifecycle of a single message from User A to User B:

Step-by-Step Flow (User B Online)

  1. User A types a message and taps send. The client generates a client_message_id (UUID) for idempotency.
  2. Client A sends via WebSocket to the chat server it's connected to (Chat Server #1). The frame contains:
    {
      "type": "message",
      "client_msg_id": "550e8400-e29b-41d4-a716-446655440000",
      "from": "user_A",
      "to": "user_B",
      "chat_id": "chat_AB",
      "content": "Hey, are you free tonight?",
      "content_type": "text",
      "timestamp": 1714000000000
    }
  3. Chat Server #1 receives the message, assigns a server-side message_id (monotonically increasing per chat, e.g., Snowflake ID), and immediately sends an ACK back to User A (single ✓):
    {
      "type": "ack",
      "client_msg_id": "550e8400-...",
      "server_msg_id": "msg_174200001",
      "status": "sent",
      "timestamp": 1714000000050
    }
  4. Chat Server #1 persists the message to Kafka (topic: messages, partition key: chat_AB) for durable storage. A Kafka consumer writes it to Cassandra asynchronously.
  5. Chat Server #1 looks up User B's connection in the Session Store (Redis): GET session:user_B{"server": "chat_server_2", "conn": "ws_889"}
  6. Chat Server #1 forwards the message to Chat Server #2 via an internal service mesh (gRPC or direct TCP). Chat Server #2 finds the WebSocket connection for User B and delivers the message.
  7. User B's client receives the message and sends a delivery ACK back through Chat Server #2. This triggers a "delivered" status (double ✓✓) that propagates back to User A.
  8. User B opens the chat and reads the message. The client sends a read receipt. User A now sees a blue ✓✓.

Step-by-Step Flow (User B Offline)

  1. Steps 1–4 are identical: User A sends, Chat Server acknowledges and persists.
  2. Session Store lookup returns NULL — User B is not connected to any chat server.
  3. Chat Server publishes to the push notification service, which sends a push notification via APNs (iOS) or FCM (Android).
  4. The message remains in Cassandra, awaiting sync when User B comes online.
  5. User B opens the app → establishes a WebSocket connection → sends a sync request with the last known server_msg_id for each chat → Chat Server queries Cassandra for all messages with IDs greater than the last known → delivers them in bulk.

▶ 1-on-1 Message Flow

Watch a message travel from User A → Chat Server → User B (online), then see the offline flow with push notification and sync.

Message Deduplication

Networks are unreliable. A client may retry a send if it doesn't receive an ACK. The server uses the client_message_id (UUID) for idempotency:

-- On the chat server (pseudocode):
def handle_message(msg):
    # Check if we already processed this client_msg_id
    if redis.sismember(f"dedup:{msg.chat_id}", msg.client_msg_id):
        # Already processed — resend the ACK but don't duplicate the message
        return send_ack(msg.client_msg_id, existing_server_msg_id)
    
    server_msg_id = snowflake.next_id()
    kafka.produce("messages", key=msg.chat_id, value=msg)
    redis.sadd(f"dedup:{msg.chat_id}", msg.client_msg_id)
    redis.expire(f"dedup:{msg.chat_id}", 86400)  # 24h TTL
    
    return send_ack(msg.client_msg_id, server_msg_id)

Group Chat Design

Group messaging introduces a fan-out problem: one message must be delivered to potentially hundreds of recipients. There are two strategies:

Strategy 1: Fan-out on Write (Write-Heavy)

When a message is sent to a group, the server immediately copies it into every member's personal inbox (message queue). Each member's chat server then delivers from their inbox.

Strategy 2: Fan-out on Read (Read-Heavy)

The message is written once to the group's message store. When a member opens the group chat, they query the group's timeline.

Our Hybrid Approach

We use a hybrid fan-out — the message is stored once in the group's Cassandra table, but we also push a lightweight notification entry into each member's message queue (via Kafka) so their chat server knows to deliver in real-time:

-- Group message flow:
1. User A sends message to group_xyz
2. Chat Server writes message to Kafka topic "group_messages"
   (partition key: group_xyz)
3. Kafka consumer writes to Cassandra: messages_by_group table
4. For each member in group_xyz (fetched from Group Service):
   a. Look up member's session in Redis
   b. If online → forward message via their chat server's WebSocket
   c. If offline → enqueue push notification
5. Each receiving client sends delivery ACK per message

▶ Group Chat Fan-out

User A sends a message to a group of 5. Watch it fan out to each member's message queue and get delivered via WebSocket.

Group Message Ordering

Within a group, all messages are produced to the same Kafka partition (partitioned by group_id), which guarantees a total order. The Kafka consumer assigns monotonically increasing sequence numbers as it writes to Cassandra. Clients use these sequence numbers to display messages in the correct order.

Hot partition problem: A very active group (e.g., a 500-member school group) can become a hot partition in Kafka. Mitigation: use a dedicated Kafka topic for "hot" groups, or sub-partition messages within a group by time buckets.

Online Presence System

The presence system answers: "Is User X online right now?" This seems simple but is surprisingly tricky at scale — especially when a user has millions of contacts who all want real-time status updates.

Heartbeat Mechanism

Each connected client sends a heartbeat to the presence server every 30 seconds. If no heartbeat is received for 90 seconds (3 missed heartbeats), the user is marked offline:

-- Heartbeat handling (Presence Server):
def handle_heartbeat(user_id):
    # Update last_active timestamp in Redis
    redis.hset(f"presence:{user_id}", mapping={
        "status": "online",
        "last_active": current_timestamp(),
        "server_id": this_server_id
    })
    # Set a TTL — if no heartbeat within 90s, key expires → offline
    redis.expire(f"presence:{user_id}", 90)

def on_key_expired(user_id):
    # Redis keyspace notification: key expired → user went offline
    publish_status_change(user_id, "offline", last_active_time)

def on_websocket_close(user_id):
    # Immediate offline detection when connection drops cleanly
    redis.hset(f"presence:{user_id}", "status", "offline")
    redis.hset(f"presence:{user_id}", "last_active", current_timestamp())
    publish_status_change(user_id, "offline", current_timestamp())

Status Fan-out via Pub/Sub

When User A's status changes, all friends who have User A's chat open need to be notified. Naive approach: look up all of User A's contacts and push to each — this is O(friends) per status change and extremely expensive for popular users.

Better approach: subscription-based pub/sub:

  1. When User B opens a chat with User A, User B subscribes to User A's presence channel: SUBSCRIBE presence:user_A
  2. When User A's status changes, the presence server publishes: PUBLISH presence:user_A {status: "online"}
  3. Only actively interested users receive the update — no wasted fan-out
  4. When User B closes the chat or goes offline, the subscription is removed

For group chats, when User B opens a group, they subscribe to presence channels of all group members. The presence server uses Redis Pub/Sub for this.

▶ Presence System: Heartbeat & Status Updates

Watch a user come online, heartbeats maintain status, and friends subscribed to the channel receive real-time updates.

Handling Edge Cases

Message Storage (Cassandra)

Why Cassandra for chat messages? It's the perfect fit:

Schema Design

-- 1-on-1 messages: partitioned by chat_id, clustered by message_id (time-ordered)
CREATE TABLE messages_by_chat (
    chat_id       TEXT,          -- deterministic: sorted(user_A, user_B)
    message_id    BIGINT,        -- Snowflake ID (encodes timestamp + server + seq)
    sender_id     TEXT,
    content       BLOB,          -- encrypted message body
    content_type  TEXT,          -- 'text', 'image', 'video', 'audio', 'document'
    media_url     TEXT,          -- S3 URL for media messages
    thumbnail     BLOB,          -- base64 thumbnail for images/videos
    status        TEXT,          -- 'sent', 'delivered', 'read'
    created_at    TIMESTAMP,
    PRIMARY KEY (chat_id, message_id)
) WITH CLUSTERING ORDER BY (message_id DESC)
  AND compaction = {'class': 'TimeWindowCompactionStrategy',
                    'compaction_window_unit': 'DAYS',
                    'compaction_window_size': 7}
  AND gc_grace_seconds = 864000
  AND default_time_to_live = 0;
-- Group messages: partitioned by group_id, clustered by message_id
CREATE TABLE messages_by_group (
    group_id      TEXT,
    message_id    BIGINT,        -- Snowflake ID
    sender_id     TEXT,
    content       BLOB,
    content_type  TEXT,
    media_url     TEXT,
    thumbnail     BLOB,
    created_at    TIMESTAMP,
    PRIMARY KEY (group_id, message_id)
) WITH CLUSTERING ORDER BY (message_id DESC)
  AND compaction = {'class': 'TimeWindowCompactionStrategy',
                    'compaction_window_unit': 'DAYS',
                    'compaction_window_size': 7};
-- Per-user chat index: which chats does a user belong to, ordered by last activity
CREATE TABLE chats_by_user (
    user_id           TEXT,
    last_message_at   TIMESTAMP,
    chat_id           TEXT,
    chat_type         TEXT,       -- 'dm' or 'group'
    other_user_id     TEXT,       -- NULL for group chats
    group_name        TEXT,       -- NULL for DMs
    last_message_preview TEXT,    -- first 100 chars of last message
    unread_count      INT,
    PRIMARY KEY (user_id, last_message_at, chat_id)
) WITH CLUSTERING ORDER BY (last_message_at DESC, chat_id ASC);
-- Read receipts: track per-user read position in each chat
CREATE TABLE read_receipts (
    chat_id       TEXT,
    user_id       TEXT,
    last_read_msg_id  BIGINT,    -- the highest message_id this user has read
    read_at       TIMESTAMP,
    PRIMARY KEY (chat_id, user_id)
);
Why chat_id as partition key? All messages in a conversation live in the same Cassandra partition. This means fetching a page of chat history is a single-partition range query — extremely fast, O(log N) seek + sequential read. The trade-off: a very active chat could create a large partition. Mitigation: add a bucket column (e.g., monthly buckets) to split large chats across partitions.

Query Patterns

-- Fetch latest 50 messages in a chat (pagination)
SELECT * FROM messages_by_chat
WHERE chat_id = 'chat_AB'
ORDER BY message_id DESC
LIMIT 50;

-- Fetch messages after a sync point (for offline → online sync)
SELECT * FROM messages_by_chat
WHERE chat_id = 'chat_AB'
AND message_id > 174200001     -- last known message_id
ORDER BY message_id ASC;

-- Fetch user's chat list (inbox)
SELECT * FROM chats_by_user
WHERE user_id = 'user_A'
LIMIT 20;

-- Get unread count across all chats
SELECT chat_id, unread_count FROM chats_by_user
WHERE user_id = 'user_A' AND unread_count > 0;

Partition Sizing

Let's estimate partition sizes:

Message Ordering & Delivery Guarantees

Correct message ordering is critical for a chat application. Imagine seeing "Yes!" before "Do you want to grab dinner?" — the conversation becomes nonsensical.

Ordering Strategies

1. Client Timestamp (Unreliable)

Using the sender's device clock is tempting but fundamentally broken:

2. Server Timestamp (Better)

The chat server assigns a timestamp when it receives the message. Problems:

3. Snowflake ID (Our Choice) ✓

We use Twitter Snowflake-style IDs: 64-bit integers that embed a timestamp and guarantee global uniqueness and roughly chronological ordering:

┌──────────────────────────────────────────────────────────────────┐
│                     64-bit Snowflake ID                          │
├──────────┬────────────┬──────────────┬──────────────────────────┤
│  1 bit   │  41 bits   │   10 bits    │       12 bits            │
│  (sign)  │ timestamp  │  machine ID  │   sequence number        │
│          │ (ms since  │  (1024       │   (4096 IDs per ms       │
│          │  epoch)    │   servers)   │    per server)            │
└──────────┴────────────┴──────────────┴──────────────────────────┘

-- 41 bits of timestamp: ~69 years from custom epoch
-- 10 bits of machine ID: supports 1024 chat servers
-- 12 bits of sequence: 4096 messages per millisecond per server

For within a single chat, we add an extra guarantee: all messages for a chat flow through the same Kafka partition (partitioned by chat_id), and the Kafka consumer assigns a per-chat sequence number. This gives us a total order within each conversation.

Delivery Guarantees

GuaranteeHow We Achieve It
At-least-once deliveryClient retries on ACK timeout + server deduplicates via client_msg_id
No message lossKafka persists messages to disk before ACKing (acks=all); Cassandra replicates to 3 nodes
Per-chat orderingSingle Kafka partition per chat + per-chat sequence number in Cassandra
Exactly-once semantics (effective)Client dedup on server_msg_id + server dedup on client_msg_id

Handling Out-of-Order Delivery

Even with all guarantees, messages can arrive out of order at the client (e.g., network jitter, reconnection). The client maintains a local buffer:

// Client-side ordering logic:
class MessageBuffer {
    constructor(chatId) {
        this.chatId = chatId;
        this.expectedSeq = lastKnownSeq + 1;
        this.buffer = new Map(); // seq → message
    }

    onMessageReceived(msg) {
        if (msg.seq === this.expectedSeq) {
            // In order — display immediately
            this.display(msg);
            this.expectedSeq++;
            // Flush any buffered messages that are now in order
            while (this.buffer.has(this.expectedSeq)) {
                this.display(this.buffer.get(this.expectedSeq));
                this.buffer.delete(this.expectedSeq);
                this.expectedSeq++;
            }
        } else if (msg.seq > this.expectedSeq) {
            // Out of order — buffer it, wait for missing messages
            this.buffer.set(msg.seq, msg);
            // If gap persists for > 2 seconds, request missing messages
            this.scheduleGapFill(this.expectedSeq, msg.seq - 1);
        }
        // msg.seq < expectedSeq → duplicate, ignore
    }
}

End-to-End Encryption

WhatsApp uses the Signal Protocol (developed by Open Whisper Systems) for end-to-end encryption. The server never sees plaintext messages — it only relays encrypted blobs.

Key Concepts

  1. Identity Key Pair — a long-term Curve25519 key pair generated on first install. The public key is uploaded to the server as the user's identity.
  2. Signed Pre-Key — a medium-term key pair, signed by the identity key. Rotated periodically (e.g., weekly).
  3. One-Time Pre-Keys — a batch of ephemeral key pairs uploaded to the server. Each is used only once for initial key exchange, then deleted.
  4. Session Key (Ratchet) — a symmetric key derived from the Diffie-Hellman exchange. Uses the Double Ratchet Algorithm to generate a new key for every single message — providing forward secrecy.

Initial Key Exchange (X3DH)

-- User A wants to message User B for the first time:

1. User A fetches User B's "key bundle" from the server:
   - Identity Key (IKb)
   - Signed Pre-Key (SPKb) + signature
   - One-Time Pre-Key (OPKb)

2. User A verifies SPKb's signature using IKb

3. User A performs X3DH (Extended Triple Diffie-Hellman):
   DH1 = DH(IKa, SPKb)    -- A's identity × B's signed pre-key
   DH2 = DH(EKa, IKb)     -- A's ephemeral × B's identity
   DH3 = DH(EKa, SPKb)    -- A's ephemeral × B's signed pre-key
   DH4 = DH(EKa, OPKb)    -- A's ephemeral × B's one-time pre-key

   Master Secret = KDF(DH1 || DH2 || DH3 || DH4)

4. Derive initial chain keys from Master Secret
5. Encrypt first message with derived key
6. Send: encrypted message + IKa (public) + EKa (public) + used OPKb ID

-- User B receives and performs the same DH calculations to derive
-- the same Master Secret, then decrypts the message.

Double Ratchet Algorithm

After the initial exchange, every message uses a new encryption key derived via two "ratchets":

What the server sees: Only encrypted blobs, sender/recipient IDs, timestamps, and message sizes. The server cannot read message content, even under a court order. This is a fundamental design choice — and a frequent interview discussion point.

Group Encryption

For group chats, WhatsApp uses the Sender Keys protocol:

  1. Each group member generates a Sender Key for the group
  2. The Sender Key is distributed to all group members via pairwise encrypted channels (using the 1-on-1 encryption above)
  3. Messages to the group are encrypted once with the sender's Sender Key (symmetric encryption)
  4. All members can decrypt using the sender's Sender Key — O(1) encryption per message instead of O(N)
  5. When a member leaves, all Sender Keys are regenerated and redistributed

Media Handling

Sending a photo, video, or document follows a different flow than text messages:

Upload Flow

  1. Client encrypts the media locally using a random AES-256 key
  2. Client requests a presigned S3 upload URL from the API server (via HTTPS):
    POST /api/v1/media/upload-url
    {
      "content_type": "image/jpeg",
      "file_size": 2456789,
      "chat_id": "chat_AB"
    }
    
    Response:
    {
      "upload_url": "https://s3.amazonaws.com/chat-media/...",
      "media_id": "media_8a7f3c",
      "expires_in": 3600
    }
  3. Client uploads encrypted media directly to S3 (bypasses chat servers — no unnecessary load)
  4. Client generates a thumbnail (e.g., 100×100 JPEG for images, first-frame for videos)
  5. Client sends a message via WebSocket containing the media_id, encryption key (encrypted with the chat's session key), thumbnail, and metadata:
    {
      "type": "message",
      "content_type": "image",
      "media_id": "media_8a7f3c",
      "media_url": "https://cdn.example.com/media/media_8a7f3c",
      "encryption_key": "<base64_aes_key_encrypted_with_session_key>",
      "thumbnail": "<base64_blurred_thumbnail>",
      "file_size": 2456789,
      "dimensions": {"width": 1920, "height": 1080},
      "caption": "Look at this sunset!"
    }
  6. Recipient receives the message, displays the thumbnail immediately, downloads the full media from S3/CDN in the background, decrypts with the AES key
Why not send media over WebSocket? WebSocket connections are precious (each server holds 500K connections). Sending a 50 MB video over a WebSocket would block that connection for seconds, delay other messages, and risk timeout. S3 presigned URLs offload the heavy lifting to dedicated storage infrastructure with built-in CDN, multipart upload, and retry support.

Media Processing Pipeline

After upload, a background pipeline processes the media:

Read Receipts & Delivery Status

The three-tick system is a core UX feature of chat apps:

StatusVisualTrigger
Sent✓ (single gray tick)Server ACKs the message (persisted to Kafka)
Delivered✓✓ (double gray tick)Recipient's device receives the message and sends delivery ACK
Read✓✓ (double blue tick)Recipient opens the chat (scrolls message into viewport)

Implementation

-- Delivery receipt (sent by recipient's device automatically):
{
  "type": "receipt",
  "receipt_type": "delivered",
  "chat_id": "chat_AB",
  "message_ids": [174200001, 174200002, 174200003],
  "user_id": "user_B",
  "timestamp": 1714000005000
}

-- Read receipt (sent when user opens the chat):
{
  "type": "receipt",
  "receipt_type": "read",
  "chat_id": "chat_AB",
  "up_to_msg_id": 174200003,    -- "I've read everything up to this ID"
  "user_id": "user_B",
  "timestamp": 1714000010000
}

Note the optimization: read receipts use up_to_msg_id instead of listing every message ID. This means "I've read all messages up to and including this ID" — one receipt covers any number of messages.

Group Read Receipts

In groups, tracking read status is more complex. Each member's read position is stored independently:

-- Query: "Who has read message X in group_xyz?"
SELECT user_id, read_at FROM read_receipts
WHERE chat_id = 'group_xyz'
AND last_read_msg_id >= 174200050;

-- This is an expensive query for large groups, so:
-- 1. Cache frequently-accessed read receipt data in Redis
-- 2. Only compute detailed "seen by" lists on user request (tap on message)
-- 3. Show a simple count ("Read by 45 of 100") by default

Multi-Device Synchronization

Users expect to access their messages on multiple devices — phone, desktop app, and web. This creates a sync challenge: how do you keep all devices in sync without duplicating every message N times?

Sync Protocol

  1. Each device maintains a sync cursor — the last_server_msg_id it has received for each chat
  2. On connect/reconnect, the device sends its sync cursors to the server:
    {
      "type": "sync_request",
      "device_id": "phone_001",
      "cursors": {
        "chat_AB": 174200001,
        "chat_AC": 174100050,
        "group_xyz": 174150020
      }
    }
  3. The server computes deltas for each chat (messages with server_msg_id > cursor) and sends them in batches
  4. Real-time messages are delivered to all connected devices simultaneously — the Session Store maps user_id → [{server, conn}, {server, conn}, ...]

Primary vs. Companion Devices

WhatsApp's original model uses a primary device (phone) with companion devices (desktop, web) that mirror through the phone. The newer multi-device architecture (WhatsApp Multi-Device) is fully independent:

Push Notifications

When a recipient is offline, we must deliver a push notification. The push notification service:

  1. Receives events from chat servers (via Kafka topic push_notifications)
  2. Looks up the user's device tokens (stored in a device registry)
  3. Formats the notification payload (sender name, message preview — if not E2E encrypted, or just "New message from X")
  4. Sends via APNs (Apple Push Notification service) for iOS or FCM (Firebase Cloud Messaging) for Android
  5. Handles token invalidation (device unregistered), rate limiting (don't spam), and notification collapsing (group multiple messages into one notification)
-- Push notification payload (FCM example):
{
  "to": "<device_fcm_token>",
  "notification": {
    "title": "User A",
    "body": "New message"      // Can't show content due to E2E encryption
  },
  "data": {
    "chat_id": "chat_AB",
    "sender_id": "user_A",
    "msg_count": 3,
    "type": "new_message"
  },
  "android": {
    "priority": "high"         // Wake device from Doze mode
  }
}
E2E encryption vs. notification preview: Since the server can't decrypt messages, push notifications can only show "New message from User A" — not the actual content. To show content in notifications, some apps (like Signal) include the encrypted message payload in the push notification and decrypt it on the device before displaying.

Scalability & Fault Tolerance

Chat Server Scaling

Kafka Scaling

Cassandra Scaling

Handling Chat Server Failures

Scenario: Chat Server #2 crashes while holding 500K connections

1. All 500K WebSocket connections drop immediately
2. Clients detect disconnect (TCP keepalive or missing heartbeat)
3. Clients reconnect with exponential backoff (1s, 2s, 4s, 8s...)
4. Load balancer routes to healthy servers (Chat Server #1, #3, ...)
5. New chat server registers the connection in Session Store (Redis)
6. Clients send sync requests to fetch missed messages
7. No messages are lost — they're all in Kafka/Cassandra

Recovery time: ~5-10 seconds for reconnection + sync

Interview Cheat Sheet

When asked "Design WhatsApp / a chat system" in an interview, here's the framework:

TopicKey Points
ProtocolWebSocket for real-time, HTTP for non-real-time. Explain the handshake.
ArchitectureChat servers (WS) + Session Store (Redis) + Message Queue (Kafka) + Message Store (Cassandra) + Push Notifications
1-on-1 flowSender → Chat Server → Session lookup → forward to recipient's server → deliver via WS. Offline: push + store.
Group chatHybrid fan-out: store once, notify each member. Mention small-group vs. large-group trade-offs.
PresenceHeartbeat-based (every 30s), Redis cache, pub/sub for updates. Debounce for flapping.
StorageCassandra: partition by chat_id, cluster by message_id (Snowflake). TimeWindow compaction.
OrderingSnowflake IDs + single Kafka partition per chat + per-chat sequence numbers.
EncryptionSignal Protocol (X3DH + Double Ratchet). Mention forward secrecy. Server sees only ciphertext.
MediaS3 presigned URL upload, send thumbnail + URL over WS. Don't send large files over WS.
DeliveryAt-least-once via client retry + server dedup. Three-tick system for receipts.

Common Follow-Up Questions