High Level Design Series · Case Studies· Post 68 of 70

Case Study: Discord at Scale

April 2026 · 22 min read

Discord has grown from a niche gaming voice-chat app to a universal communication platform serving 200 million+ monthly active users who spend over 4 billion minutes in conversation every day. At peak, Discord handles tens of millions of concurrent users across millions of active guilds (servers), delivering messages with sub-100ms latency.

What makes Discord's infrastructure fascinating isn't just the scale — it's the technology choices. Elixir on the BEAM VM for real-time messaging. A database migration from MongoDB to Cassandra to ScyllaDB. Rust for performance-critical hot paths. WebRTC with custom SFU infrastructure for voice and video. Every decision was driven by specific engineering constraints that we'll examine in detail.

Discord by the Numbers (2024–2025):

200M+ MAU — monthly active users across desktop, mobile, web
4B+ minutes/day — of voice, video, and text conversation
19M+ active servers — guilds ranging from 2 members to 1M+
~150M+ messages/day at peak — all persisted and searchable
Millions of concurrent voice connections — real-time audio/video
Trillions of stored messages — petabytes of data in the message store

Elixir & the BEAM VM: Built for Real-Time

Discord's real-time messaging layer is built on Elixir, a functional language that runs on the BEAM virtual machine — the same VM that powers Erlang and has been battle-tested in telecom systems (Ericsson's phone switches) for 30+ years. This wasn't a trendy choice; it was an engineering one driven by very specific properties of the BEAM.

Why the BEAM VM?

The BEAM provides four critical properties that map perfectly to Discord's requirements:

1. Lightweight Processes: BEAM processes are not OS threads. They're incredibly lightweight — each process uses only ~2KB of memory and is scheduled by the BEAM's own preemptive scheduler. Discord spawns millions of processes per node — one per connected WebSocket session, one per guild, one per voice connection.

# Each connected user gets their own process
# A single Elixir node can handle 1M+ concurrent connections

defmodule Discord.Gateway.Session do
  use GenServer

  # ~2KB per process — 1 million sessions = ~2GB RAM
  def start_link(user_id, socket) do
    GenServer.start_link(__MODULE__, %{
      user_id: user_id,
      socket: socket,
      guilds: [],          # guilds this user is in
      presence: :online,
      sequence: 0,         # event sequence number for resuming
      heartbeat_ref: nil
    })
  end

  def init(state) do
    # Schedule heartbeat check every 41.25 seconds
    ref = Process.send_after(self(), :check_heartbeat, 41_250)
    {:ok, %{state | heartbeat_ref: ref}}
  end

  # Handle incoming heartbeat from client
  def handle_info(:heartbeat, state) do
    send_to_socket(state.socket, %{op: 11})  # HEARTBEAT_ACK
    {:noreply, %{state | last_heartbeat: System.monotonic_time()}}
  end

  # If no heartbeat received, close the zombie connection
  def handle_info(:check_heartbeat, state) do
    if System.monotonic_time() - state.last_heartbeat > 45_000 do
      Logger.warn("Zombie session for user #{state.user_id}")
      {:stop, :heartbeat_timeout, state}
    else
      ref = Process.send_after(self(), :check_heartbeat, 41_250)
      {:noreply, %{state | heartbeat_ref: ref}}
    end
  end
end

2. Preemptive Scheduling & Reduction Counting: The BEAM scheduler doesn't use cooperative multitasking. It tracks reductions (roughly, function calls) and preempts any process that exceeds its reduction budget (~4000 reductions). This means a single expensive operation cannot starve other processes. For Discord, this guarantees that one user sending a massive message doesn't block delivery to other users on the same node.

# BEAM preemptive scheduling ensures fairness
#
# Traditional event-loop (Node.js-like):
#   User A sends 10MB message → parser blocks the loop
#   → All other users on this thread are blocked
#
# BEAM:
#   User A sends 10MB message → parser gets 4000 reductions
#   → Preempted → Scheduler runs User B, C, D processes
#   → Comes back to User A → Another 4000 reductions
#   → No one is starved

# The scheduler runs on each CPU core:
# 4-core machine = 4 schedulers, each with its own run queue
# Processes are load-balanced across schedulers
# Work-stealing: idle scheduler steals from busy scheduler's queue

3. Fault Isolation via Process Isolation: Each BEAM process has its own heap and garbage collector. If one process crashes, it does not affect any other process. Discord uses supervision trees to automatically restart crashed processes. If a user's session crashes, only that user is affected — and they're immediately reconnected.

# Supervision tree: automatic crash recovery
defmodule Discord.Gateway.Supervisor do
  use Supervisor

  def start_link(opts) do
    Supervisor.start_link(__MODULE__, opts, name: __MODULE__)
  end

  def init(_opts) do
    children = [
      # If the SessionRegistry crashes, restart it
      {Discord.Gateway.SessionRegistry, []},

      # DynamicSupervisor: one child per connected session
      # :one_for_one = if one session crashes, only that session restarts
      {DynamicSupervisor,
        name: Discord.Gateway.SessionSupervisor,
        strategy: :one_for_one,
        max_restarts: 10_000,    # allow up to 10K restarts
        max_seconds: 1           # per second before supervisor itself restarts
      },

      # Guild process supervisor
      {DynamicSupervisor,
        name: Discord.Guild.Supervisor,
        strategy: :one_for_one
      }
    ]

    # :rest_for_one = if SessionRegistry crashes,
    # restart it AND everything started after it
    Supervisor.init(children, strategy: :rest_for_one)
  end
end

# Impact of a crash:
# Session process crashes → Supervisor restarts it → ~50ms
# User sees a brief disconnect, auto-reconnects via RESUME
# No other users affected
# Guild process crashes → All members get re-subscribed → ~200ms
# No messages lost (messages are persisted separately)

4. Hot Code Upgrades: The BEAM supports loading new code while the system is running, without dropping connections. Discord can deploy new gateway code to a node while millions of users are connected — their WebSocket connections remain intact throughout the upgrade.

# Hot code upgrade: deploy without dropping connections
#
# Traditional deployment:
#   1. Drain connections from node (users disconnect)
#   2. Deploy new code
#   3. Restart node
#   4. Users reconnect (thundering herd!)
#
# BEAM hot code upgrade:
#   1. Compile new .beam files
#   2. Load new module versions into VM
#   3. Old processes continue on old code
#   4. Next message dispatch uses new code
#   5. Old code is garbage collected when no process references it
#
# Two versions of every module can coexist in the VM simultaneously
# Zero downtime, zero dropped connections

Erlang Distribution Protocol

BEAM nodes communicate via the Erlang Distribution Protocol — a built-in mechanism for sending messages between processes on different physical machines. Discord uses this for cross-node guild communication:

# Cross-node communication is transparent
# Sending a message to a process on another machine looks identical
# to sending to a local process

# On Node A:
guild_pid = :global.whereis_name({:guild, guild_id})
GenServer.cast(guild_pid, {:new_message, message})

# guild_pid might be on Node A, Node B, or Node C
# The syntax is IDENTICAL — BEAM handles serialization,
# network transport, and delivery transparently

# This is why Elixir/BEAM is perfect for distributed systems:
# Location transparency is built into the VM

BEAM vs. Alternatives for Real-Time Messaging:

Node.js — single-threaded event loop. One slow handler blocks everything. No process isolation. Requires clustering libraries for multi-core.
Go — goroutines are lightweight but share memory. One goroutine panicking can crash the process. No hot code upgrades. No built-in distribution.
Java — threads are expensive (~1MB stack each). GC pauses affect all threads. Virtual threads (Loom) improve this but lack BEAM's isolation.
Elixir/BEAM — lightweight isolated processes, preemptive scheduling, per-process GC, built-in distribution, hot code upgrades. Purpose-built for exactly this workload.

Guild Sharding Architecture

A guild (what users call a "server") is the fundamental unit of state in Discord. Each guild contains channels, members, roles, permissions, and active voice states. Discord's sharding strategy is built around guilds as the atomic unit.

The Guild Process

Every guild is a single Elixir GenServer process — a long-lived, stateful process that holds the guild's state in memory and processes all events sequentially. This eliminates the need for distributed locks or database-backed coordination for guild operations.

defmodule Discord.Guild.Server do
  use GenServer

  defstruct [
    :id,
    :name,
    :owner_id,
    channels: %{},        # channel_id => %Channel{}
    members: %{},         # user_id => %Member{roles, nickname, ...}
    roles: %{},           # role_id => %Role{permissions, position, ...}
    voice_states: %{},    # user_id => %VoiceState{channel_id, muted, ...}
    presences: %{},       # user_id => :online | :idle | :dnd | :offline
    member_count: 0,
    large?: false         # true if member_count > 250
  ]

  # All operations on a guild are serialized through this process
  # No locks needed! GenServer mailbox = built-in message queue

  def handle_cast({:send_message, channel_id, author_id, content}, state) do
    # 1. Permission check (in-memory, no DB call)
    with :ok <- check_permissions(state, author_id, channel_id, :send_messages),
         :ok <- check_rate_limit(author_id) do

      message = %Message{
        id: Snowflake.generate(),
        channel_id: channel_id,
        author_id: author_id,
        content: content,
        timestamp: DateTime.utc_now()
      }

      # 2. Persist to message store (async, don't block the guild process)
      Task.start(fn -> MessageStore.persist(message) end)

      # 3. Fan out to all connected sessions in this channel
      fan_out_message(state, channel_id, message)

      {:noreply, state}
    else
      {:error, :missing_permissions} ->
        {:noreply, state}
      {:error, :rate_limited} ->
        {:noreply, state}
    end
  end

  defp fan_out_message(state, channel_id, message) do
    # Get all members who have access to this channel
    channel = state.channels[channel_id]

    state.members
    |> Enum.filter(fn {user_id, member} ->
      has_channel_access?(state, member, channel)
    end)
    |> Enum.each(fn {user_id, _member} ->
      # Send to user's gateway session (might be on another node)
      case SessionRegistry.lookup(user_id) do
        {:ok, session_pids} ->
          # User might have multiple sessions (desktop + mobile)
          Enum.each(session_pids, fn pid ->
            send(pid, {:dispatch, :MESSAGE_CREATE, message})
          end)
        :not_found ->
          :ok  # User is offline, no session to send to
      end
    end)
  end
end

Shard-to-Node Mapping

Discord assigns guilds to Elixir nodes using a consistent hash ring. The guild ID determines which node "owns" that guild. When a node goes down, its guilds are redistributed to remaining nodes.

# Guild-to-node assignment via consistent hashing
#
# ┌──────────────────────────────────────────────┐
# │              Hash Ring (0 to 2^32)           │
# │                                              │
# │    Node-A          Node-B         Node-C     │
# │      │               │              │        │
# │  ────●───────────────●──────────────●────    │
# │      ↑               ↑              ↑        │
# │  Guilds with      Guilds with    Guilds with │
# │  hash 0..X        hash X..Y     hash Y..2^32│
# └──────────────────────────────────────────────┘
#
# Each node has multiple virtual nodes (vnodes) for even distribution
# Adding a node: only ~1/N guilds need to migrate
# Removing a node: its guilds distribute evenly to remaining nodes

defmodule Discord.Guild.Router do
  @ring_size 2048  # number of virtual nodes

  def get_node(guild_id) do
    hash = :erlang.phash2(guild_id, @ring_size)
    # Look up which physical node owns this vnode
    :ets.lookup(:guild_ring, hash)
    |> case do
      [{^hash, node}] -> node
      [] -> fallback_node()
    end
  end

  # When a user connects, they subscribe to all their guilds
  def subscribe_to_guilds(session_pid, guild_ids) do
    Enum.each(guild_ids, fn guild_id ->
      node = get_node(guild_id)
      # Tell the guild process on that node about this session
      :rpc.cast(node, Discord.Guild.Server, :add_session,
                [guild_id, session_pid])
    end)
  end
end

Client-Side Gateway Sharding

For bots or large applications, Discord also supports client-side gateway sharding. A bot in 100,000 guilds doesn't connect once — it connects with multiple shards, each receiving events for a subset of guilds:

# Client-side shard assignment
# shard_id = (guild_id >> 22) % num_shards
#
# Bot with 100K guilds, 16 shards:
# Shard 0: receives events for guilds where (id >> 22) % 16 == 0
# Shard 1: receives events for guilds where (id >> 22) % 16 == 1
# ...
# Shard 15: receives events for guilds where (id >> 22) % 16 == 15
#
# Each shard is a separate WebSocket connection
# Max guilds per shard: 2,500 (Discord enforced limit)
# Required shards = ceil(guild_count / 2500)

# IDENTIFY payload specifies the shard:
{
  "op": 2,                    // IDENTIFY opcode
  "d": {
    "token": "Bot MTk...",
    "intents": 33281,         // bitfield of subscribed events
    "shard": [0, 16],         // [shard_id, total_shards]
    "properties": {
      "os": "linux",
      "browser": "discord.py",
      "device": "discord.py"
    }
  }
}

Message Delivery Path

The following animation traces a message from the moment a user presses Enter to its delivery across all connected clients in the channel — illustrating guild-level process sharding, fanout, and WebSocket delivery.

▶ Message Delivery Path

Watch how a message flows through Discord's guild-sharded Elixir infrastructure to reach every connected client.

Message Storage Evolution

Discord's message storage is one of the most instructive case studies in database evolution at scale. The system went through three major phases, each driven by hitting concrete scaling walls.

Phase 1: MongoDB (2015–2016)

Discord launched with MongoDB — a practical choice for a startup that needed to iterate quickly. Messages were stored as documents with flexible schemas. At low scale, this worked fine.

// MongoDB message document (early Discord)
{
  "_id": ObjectId("..."),
  "channel_id": "123456789",
  "author_id": "987654321",
  "content": "Hello world!",
  "timestamp": ISODate("2016-01-15T10:30:00Z"),
  "attachments": [],
  "embeds": [],
  "mentions": ["111222333"],
  "reactions": {
    "👍": ["user1", "user2"],
    "❤️": ["user3"]
  }
}

// Problem: as messages grew to hundreds of millions, MongoDB struggled.
// - Single-document write locks in WiredTiger
// - Memory-mapped storage engine couldn't handle the working set
// - Sharding added operational complexity without solving read latency
// - No built-in time-series optimization for append-heavy workload

Phase 2: Cassandra (2016–2022)

Discord migrated to Apache Cassandra — a wide-column, masterless NoSQL database designed for high write throughput and linear horizontal scaling. Messages were modeled as time-series data with the channel as the partition key.

-- Cassandra message table schema
CREATE TABLE messages (
    channel_id  bigint,
    message_id  bigint,       -- Snowflake ID (encodes timestamp)
    author_id   bigint,
    content     text,
    attachments frozen<list<frozen<attachment>>>,
    embeds      frozen<list<frozen<embed>>>,
    mentions    frozen<set<bigint>>,
    reactions   frozen<map<text, frozen<set<bigint>>>>,
    edited_at   timestamp,
    PRIMARY KEY ((channel_id), message_id)
) WITH CLUSTERING ORDER BY (message_id DESC)
  AND compaction = {'class': 'TimeWindowCompactionStrategy',
                    'compaction_window_size': 1,
                    'compaction_window_unit': 'DAYS'}
  AND gc_grace_seconds = 864000;

-- Query: Get latest 50 messages in a channel (extremely fast)
SELECT * FROM messages
WHERE channel_id = 123456789
ORDER BY message_id DESC
LIMIT 50;

-- This query hits a SINGLE partition — no scatter-gather
-- Cassandra returns results in ~1-5ms for small partitions

Cassandra worked well for years but eventually hit critical issues at Discord's scale:

Why Cassandra Failed at Discord's Scale:

GC Pauses: Cassandra runs on the JVM. As heap pressure increased, GC pauses (particularly G1GC mixed collections) would spike to 500ms–2s, causing p99 tail latency spikes that were visible to users as "stuck" messages.
Hot Partitions: Popular channels (announcements in million-member guilds) created massive partitions. Cassandra has no per-partition load balancing — all reads/writes for a partition go to the same replicas.
Compaction Storms: Large SSTables triggered compaction that temporarily doubled disk I/O and caused read latency spikes. Leveled compaction helped but increased write amplification 10x.
Read Amplification: Reading from multiple SSTables per query. Bloom filters helped but weren't sufficient for Discord's read patterns (random access to old messages via search/jump).
Operational Burden: JVM tuning, heap sizing, GC algorithm selection, compaction strategy tuning — the operational surface area was enormous. A team of engineers spent significant time just keeping Cassandra healthy.

# Cassandra GC pause impact on Discord
#
# Normal operation:
# p50 read latency: 2ms
# p99 read latency: 15ms
#
# During GC pause:
# p50 read latency: 2ms (unaffected nodes)
# p99 read latency: 800ms+ ← visible to users!
# p999 read latency: 2000ms+ ← messages appear "stuck"
#
# The problem: speculative retries (read from replica if primary is slow)
# helped p99 but increased cluster load, making GC pauses more frequent.
# A vicious cycle.
#
# Timeline of a GC-induced incident:
# T+0s:    Node 7 enters G1 mixed GC collection
# T+0.2s:  All reads to partitions on Node 7 stall
# T+0.3s:  Speculative retries hit replica nodes (Node 3, Node 11)
# T+0.5s:  Node 3 now under increased load, its GC pressure rises
# T+0.8s:  Node 7 exits GC, but Node 3 enters GC
# T+1.2s:  Cascade continues across the cluster
# T+5s:    Cluster stabilizes, but users experienced 5s of degradation

Phase 3: ScyllaDB (2022–Present)

Discord migrated from Cassandra to ScyllaDB — a Cassandra-compatible database rewritten from scratch in C++ with a fundamentally different architecture. The migration was transparent to application code because ScyllaDB speaks the CQL (Cassandra Query Language) protocol.

ScyllaDB's Shard-Per-Core Architecture:

# ScyllaDB vs Cassandra: Architectural Differences
#
# ┌─────────────────────────────────────────────────┐
# │              CASSANDRA (JVM-based)               │
# │                                                  │
# │  ┌──────────────────────────────────────────┐   │
# │  │         Shared JVM Heap (32GB)           │   │
# │  │                                          │   │
# │  │  Thread 1  Thread 2  Thread 3  Thread 4  │   │
# │  │     ↕         ↕         ↕         ↕      │   │
# │  │  ┌─────────────────────────────────────┐ │   │
# │  │  │   Shared Memory / Shared SSTables   │ │   │
# │  │  └─────────────────────────────────────┘ │   │
# │  │                                          │   │
# │  │  ⚠ GC pauses freeze ALL threads          │   │
# │  │  ⚠ Lock contention between threads       │   │
# │  │  ⚠ NUMA-unaware memory allocation        │   │
# │  └──────────────────────────────────────────┘   │
# └─────────────────────────────────────────────────┘
#
# ┌─────────────────────────────────────────────────┐
# │          SCYLLADB (C++ / Seastar framework)      │
# │                                                  │
# │  ┌──────┐  ┌──────┐  ┌──────┐  ┌──────┐        │
# │  │Core 0│  │Core 1│  │Core 2│  │Core 3│        │
# │  │      │  │      │  │      │  │      │        │
# │  │Own   │  │Own   │  │Own   │  │Own   │        │
# │  │memory│  │memory│  │memory│  │memory│        │
# │  │      │  │      │  │      │  │      │        │
# │  │Own   │  │Own   │  │Own   │  │Own   │        │
# │  │SSTs  │  │SSTs  │  │SSTs  │  │SSTs  │        │
# │  │      │  │      │  │      │  │      │        │
# │  │Own   │  │Own   │  │Own   │  │Own   │        │
# │  │network│ │network│ │network│ │network│       │
# │  │queue │  │queue │  │queue │  │queue │        │
# │  └──────┘  └──────┘  └──────┘  └──────┘        │
# │                                                  │
# │  ✓ No GC — deterministic memory management       │
# │  ✓ No locks — each core is independent           │
# │  ✓ NUMA-aware — memory pinned to local NUMA node │
# │  ✓ Each core = independent shard of data          │
# └─────────────────────────────────────────────────┘

Key advantages that solved Discord's problems:

Problem (Cassandra)	Solution (ScyllaDB)
JVM GC pauses (500ms–2s)	No GC — C++ with manual memory management via Seastar allocator. Zero pause time.
Thread contention & locks	Shard-per-core: each CPU core owns its data slice. No shared mutable state, no locks.
NUMA-unaware allocation	Memory is pinned to the local NUMA node of each core. No cross-NUMA traffic.
Compaction storms	Per-core compaction runs independently and is I/O-scheduled to not impact reads.
Operational tuning burden	Self-tuning: ScyllaDB auto-configures based on hardware. No JVM heap/GC tuning.
177 Cassandra nodes	72 ScyllaDB nodes — same throughput, better latency, less than half the hardware.

# Discord's migration results (publicly shared):
#
# BEFORE (Cassandra):
# - 177 nodes
# - p99 read latency: 40-125ms (with GC spikes to 1s+)
# - p99 write latency: 5-70ms
# - Frequent GC-related incidents
# - Dedicated team for Cassandra operations
#
# AFTER (ScyllaDB):
# - 72 nodes (59% fewer machines)
# - p99 read latency: 15ms (CONSISTENT — no spikes)
# - p99 write latency: 5ms (CONSISTENT)
# - Zero GC-related incidents (no GC exists)
# - Reduced operational burden significantly
#
# The migration was transparent to application code:
# ScyllaDB speaks CQL — same queries, same drivers, same schema
# Discord ran both clusters in parallel during migration,
# dual-writing and comparing results for months before cutover

Message Data Model Details

Discord uses Snowflake IDs — 64-bit IDs that encode the timestamp of creation. This means message IDs are naturally ordered by time, and you can extract the creation time from any message ID without a database lookup:

# Discord Snowflake ID structure (64 bits):
#
# ┌──────────────────────────────────────────────────────┐
# │  Timestamp (ms since Discord Epoch)  │ Worker │ Seq  │
# │           42 bits                    │ 10 bits│12bits│
# └──────────────────────────────────────────────────────┘
#
# Discord Epoch: 2015-01-01T00:00:00.000Z (1420070400000)
#
# Example: message_id = 1234567890123456789
# timestamp_ms = (1234567890123456789 >> 22) + 1420070400000
# = Unix timestamp of when the message was created
#
# Why this matters for storage:
# - Messages are ALREADY sorted by time (no secondary index needed)
# - Partition key = channel_id, clustering key = message_id (DESC)
# - "Get latest 50 messages" = single partition scan, pre-sorted
# - "Jump to message" = single partition seek by message_id
# - "Messages before timestamp" = convert timestamp to Snowflake,
#   then seek to that position in the partition

def snowflake_to_timestamp(snowflake_id):
    discord_epoch = 1420070400000
    timestamp_ms = (snowflake_id >> 22) + discord_epoch
    return datetime.fromtimestamp(timestamp_ms / 1000)

def timestamp_to_snowflake(dt):
    discord_epoch = 1420070400000
    timestamp_ms = int(dt.timestamp() * 1000) - discord_epoch
    return timestamp_ms << 22  # minimum possible ID at this timestamp

Cassandra → ScyllaDB Migration

This animation visualizes the latency improvements Discord observed when migrating from Cassandra to ScyllaDB, showing how JVM GC pauses caused periodic latency spikes that completely disappeared with ScyllaDB's shard-per-core C++ architecture.

▶ Cassandra to ScyllaDB Migration

Compare Cassandra's GC-induced latency spikes versus ScyllaDB's smooth, consistent latency profile.

Voice & Video: WebRTC with SFUs

Discord's voice and video infrastructure uses WebRTC with Selective Forwarding Units (SFUs) rather than peer-to-peer mesh or MCU (Multipoint Control Unit) architectures.

Why SFU?

# Voice/Video Architecture Options:
#
# 1. PEER-TO-PEER MESH (e.g., small WebRTC calls)
#    Each participant sends their stream to every other participant
#    Connections = N × (N-1) / 2
#    5 users = 10 connections, 10 users = 45 connections
#    ⚠ Doesn't scale beyond ~5 users (bandwidth explosion)
#
#    User A ←──→ User B
#      ↕  ╲      ╱  ↕
#    User C ←──→ User D
#
# 2. MCU (Multipoint Control Unit)
#    Server receives all streams, mixes them into one, sends back
#    ✓ Low client bandwidth (receives 1 stream)
#    ⚠ Enormous server CPU (decode + mix + encode per participant)
#    ⚠ Fixed layout, no client-side flexibility
#
# 3. SFU (Selective Forwarding Unit) — Discord's choice
#    Server receives each participant's stream
#    FORWARDS (not mixes) selected streams to each participant
#    ✓ Low client upload (send 1 stream to server)
#    ✓ Server just forwards packets (no decode/encode)
#    ✓ Client chooses which streams to display
#    ✓ Scales to hundreds of participants
#
#    User A ──→ ┌─────┐ ──→ User B (receives A, C, D)
#    User B ──→ │ SFU │ ──→ User A (receives B, C, D)
#    User C ──→ │     │ ──→ User C (receives A, B, D)
#    User D ──→ └─────┘ ──→ User D (receives A, B, C)
#
#    Server CPU: O(N) forwarding vs O(N²) mixing

Voice Server Regions

Discord operates voice servers in multiple geographic regions. When a user joins a voice channel, they're connected to the voice server closest to the guild's configured region (or automatic region selection based on participant locations):

# Voice connection flow:
#
# 1. User clicks "Join Voice Channel"
# 2. Client sends VOICE_STATE_UPDATE via gateway WebSocket
# 3. Guild process selects optimal voice server region
# 4. Gateway sends VOICE_SERVER_UPDATE to client:
#    {
#      "token": "voice_session_token",
#      "guild_id": "123",
#      "endpoint": "us-east-1.voice.discord.gg:443"
#    }
# 5. Client establishes WebSocket to voice server
# 6. UDP hole-punching for media transport
# 7. Client sends/receives Opus-encoded audio via RTP over UDP
#
# Audio codec: Opus (optimized for voice, 6-510 kbps)
# Video codec: H.264 / VP8 / VP9 / AV1
# Transport: SRTP (Secure RTP) over UDP
# Signaling: WebSocket to voice server
#
# Simulcast: client sends multiple quality levels
# (e.g., 720p, 360p, 180p) — SFU selects which to forward
# based on each receiver's available bandwidth and UI layout

Voice Optimization Details:

Opus codec: Discord uses Opus at 64kbps for voice channels. Opus handles packet loss gracefully through Forward Error Correction (FEC) and Packet Loss Concealment (PLC).
Jitter buffer: Adaptive jitter buffer absorbs network timing variations. Target buffer: 40–120ms depending on connection quality.
Voice Activity Detection (VAD): Client-side VAD suppresses packets when the user isn't speaking. Reduces server bandwidth by ~60% in typical conversations.
Priority Speaker: Stage Channels use priority tagging in RTP headers. SFU prioritizes forwarding these streams and drops non-priority streams under congestion.

Rust for Performance-Critical Services

While Elixir handles the real-time messaging layer, Discord uses Rust for services where raw throughput and predictable latency are critical. The two most prominent examples are the Read States service and Member List indexing.

Read States Service

Every user has "read states" — tracking which messages they've read in every channel they have access to. This is one of the highest-traffic services at Discord because it's updated on every message view:

// Read States: Rust service tracking per-user-per-channel read position
//
// Data model:
// Key: (user_id, channel_id)
// Value: {
//   last_read_message_id: Snowflake,
//   mention_count: u32,
//   last_pin_timestamp: Option,
// }
//
// Scale: 200M users × average ~100 channels each = ~20 BILLION read states
// Update frequency: every time a user opens a channel or scrolls
//
// Why Rust?
// - The Go version had GC pauses causing latency spikes (sound familiar?)
// - Millions of updates/second with p99 < 1ms requirement
// - Memory efficiency: compact representation, no GC overhead

use dashmap::DashMap;
use std::sync::Arc;

pub struct ReadStateService {
    // Sharded concurrent hashmap — no global locks
    states: Arc>,
    // Background persistence to database
    persist_queue: crossbeam::channel::Sender,
}

#[derive(Clone, Copy)]
pub struct ReadState {
    pub last_read_id: u64,      // Snowflake ID
    pub mention_count: u32,
    pub last_pin_ts: Option,
}

impl ReadStateService {
    pub fn ack_message(&self, user_id: UserId, channel_id: ChannelId,
                       message_id: u64) {
        // Lock-free update via DashMap shard
        self.states
            .entry((user_id, channel_id))
            .and_modify(|state| {
                if message_id > state.last_read_id {
                    state.last_read_id = message_id;
                    state.mention_count = 0; // reading clears mentions
                }
            })
            .or_insert(ReadState {
                last_read_id: message_id,
                mention_count: 0,
                last_pin_ts: None,
            });

        // Async persist (batched, coalesced)
        let _ = self.persist_queue.try_send(ReadStateUpdate {
            user_id, channel_id, message_id,
        });
    }

    pub fn get_unread_count(&self, user_id: UserId,
                             channel_id: ChannelId,
                             latest_message_id: u64) -> u32 {
        match self.states.get(&(user_id, channel_id)) {
            Some(state) => {
                if latest_message_id > state.last_read_id {
                    // Channel has unread messages
                    // Exact count requires DB query; return mention count
                    state.mention_count
                } else {
                    0
                }
            }
            None => 0, // Never visited this channel
        }
    }
}

Member List Indexing

The member list sidebar in Discord shows users grouped by role and sorted by status. For large guilds, this is a complex data structure that must be updated in real-time as members come online/offline or change roles:

// Member list: sorted, grouped, real-time updated
//
// Display order:
// ┌────────────────────┐
// │ ★ ADMIN — 3        │  ← role group header
// │   Alice (online)   │
// │   Bob (idle)       │
// │   Charlie (dnd)    │
// │ ⚔ MODERATOR — 5   │
// │   Dave (online)    │
// │   Eve (online)     │
// │   ...              │
// │ 👤 ONLINE — 1,247  │  ← @everyone who are online
// │   ...              │
// │ 💤 OFFLINE — 8,432 │  ← offline members
// │   ...              │
// └────────────────────┘
//
// Requirements:
// - Sorted by role hierarchy, then by status, then alphabetically
// - Updated in real-time (presence changes, role changes)
// - For a 100K member guild, we can't send the full list
// - Solution: send only the visible "window" + updates

// Rust B-tree index for efficient range queries and updates
use std::collections::BTreeMap;

pub struct MemberListIndex {
    // Composite key for natural sort order:
    // (role_position, online_status, username_lowercase)
    tree: BTreeMap,

    // Reverse index for O(1) lookups by member
    member_to_key: HashMap,

    // Subscribers: which clients are viewing which range
    subscribers: HashMap,
}

#[derive(Ord, PartialOrd, Eq, PartialEq)]
pub struct MemberListKey {
    role_position: u16,      // lower = higher priority
    status_order: u8,        // 0=online, 1=idle, 2=dnd, 3=offline
    username_lower: String,
}

impl MemberListIndex {
    // When a member's presence changes:
    pub fn update_presence(&mut self, member_id: MemberId,
                           new_status: Status) {
        if let Some(old_key) = self.member_to_key.remove(&member_id) {
            self.tree.remove(&old_key);

            let new_key = MemberListKey {
                status_order: status_to_order(new_status),
                ..old_key
            };
            self.tree.insert(new_key.clone(), member_id);
            self.member_to_key.insert(member_id, new_key);

            // Notify subscribers whose view range is affected
            self.notify_affected_subscribers(&old_key, &new_key);
        }
    }
}

Lazy Guilds: Scaling to Million-Member Servers

When Discord guilds grew beyond 250,000 members, sending the full member list and presence data on connection became untenable. A guild with 1 million members would require sending ~100MB+ of member/presence data to every user who connected — causing multi-second load times and massive bandwidth costs.

Discord solved this with Lazy Guilds (also called "lazy-load member lists"):

# Lazy Guild protocol:
#
# BEFORE lazy guilds (small guild, < 250 members):
# 1. User connects to gateway
# 2. READY event contains FULL member list + presences for all guilds
# 3. Client has complete data — can render member list immediately
#
# AFTER lazy guilds (large guild, > 250 members):
# 1. User connects to gateway
# 2. READY event contains guild metadata BUT:
#    - member_count: 500000 (just the count)
#    - members: [] (empty!)
#    - presences: [] (empty!)
# 3. Client opens the guild → sends LAZY_REQUEST:
#    {
#      "op": 14,  // LAZY_REQUEST
#      "d": {
#        "guild_id": "123456",
#        "channels": {
#          "789012": [[0, 99]]  // Request member list rows 0-99
#        }
#      }
#    }
# 4. Server responds with GUILD_MEMBER_LIST_UPDATE:
#    - Sends only the requested range of the sorted member list
#    - Includes group headers (role names + counts)
#    - Includes member objects for visible rows only
# 5. As user scrolls, client requests more ranges
# 6. As presences change, server sends incremental SYNC updates
#    only for members in the client's subscribed ranges
#
# Result:
# - Initial load: ~5KB instead of ~100MB
# - Only transfers data for ~100 visible members at a time
# - Incremental updates only for subscribed ranges
# - Million-member guilds load as fast as 50-member guilds

# The threshold:
# guild.member_count <= 250 → "small" guild → full member list
# guild.member_count > 250  → "large" guild → lazy loading
# This is why Discord calls them "large" guilds in the API

Lazy Guild Optimization Chain:

Subscription model: Client explicitly subscribes to channel member list ranges. Server only sends updates for subscribed ranges — dramatically reduces per-user event traffic.
Incremental syncs: Member list updates are sent as diffs (INSERT at index 5, UPDATE at index 12, DELETE at index 30) rather than full list re-sends.
Group headers: Role groups include counts but not all members. "Online — 12,345" is a single group header object, not 12,345 member objects.
Offline pruning: For guilds with >1000 members, offline members are completely excluded from the member list index. They're only loaded if the user explicitly scrolls to the offline section.

Data Pipeline: Kafka & Spark

Discord's analytics and data pipeline processes billions of events daily using Apache Kafka as the central event bus and Apache Spark for batch and stream processing:

# Discord Data Pipeline Architecture:
#
# ┌──────────┐   ┌──────────┐   ┌──────────┐
# │ Gateway  │   │ API      │   │ Voice    │
# │ Events   │   │ Requests │   │ Events   │
# └────┬─────┘   └────┬─────┘   └────┬─────┘
#      │              │              │
#      └──────────────┼──────────────┘
#                     ▼
#            ┌────────────────┐
#            │  Apache Kafka  │
#            │                │
#            │  Topics:       │
#            │  - messages    │
#            │  - presences   │
#            │  - voice       │
#            │  - guild_events│
#            │  - api_requests│
#            └───────┬────────┘
#                    │
#          ┌─────────┼──────────┐
#          ▼         ▼          ▼
#    ┌──────────┐ ┌────────┐ ┌──────────┐
#    │ Spark    │ │ Real-  │ │ Data     │
#    │ Batch    │ │ time   │ │ Warehouse│
#    │ Jobs     │ │ Stream │ │ (BigQuery│
#    │          │ │ (Flink)│ │  / S3)   │
#    └──────────┘ └────────┘ └──────────┘
#         │           │           │
#         ▼           ▼           ▼
#    ┌──────────┐ ┌────────┐ ┌──────────┐
#    │ ML Model │ │ Alerts │ │ Dashboards│
#    │ Training │ │ & Fraud│ │ & Reports │
#    └──────────┘ └────────┘ └──────────┘

# Key Kafka topics and their throughput:
# - discord.messages: ~150M+ messages/day, partitioned by channel_id
# - discord.presence: ~billions of updates/day (online/offline/idle)
# - discord.voice: voice session start/stop/quality metrics
# - discord.guild_events: member join/leave, role changes
# - discord.api: all API request logs for abuse detection

Search: Elasticsearch

Discord uses Elasticsearch for full-text message search across opted-in servers. This is a massive indexing challenge given trillions of historical messages:

# Elasticsearch Message Index
#
# Index: discord-messages-YYYY-MM
# (monthly indices for lifecycle management)
#
# Mapping:
{
  "mappings": {
    "properties": {
      "id":          { "type": "long" },
      "channel_id":  { "type": "long" },
      "guild_id":    { "type": "long" },
      "author_id":   { "type": "long" },
      "content":     {
        "type": "text",
        "analyzer": "discord_analyzer",
        "fields": {
          "exact": { "type": "keyword" }  // for exact-match filters
        }
      },
      "timestamp":   { "type": "date" },
      "has_embed":   { "type": "boolean" },
      "has_file":    { "type": "boolean" },
      "mention_ids": { "type": "long" },
      "pinned":      { "type": "boolean" }
    }
  }
}

# Search query: "from:Alice has:image python tutorial"
# Translates to:
{
  "query": {
    "bool": {
      "must": [
        { "match": { "content": "python tutorial" } }
      ],
      "filter": [
        { "term": { "guild_id": 123456789 } },
        { "term": { "author_id": 111222333 } },
        { "term": { "has_file": true } }
      ]
    }
  },
  "sort": [{ "timestamp": "desc" }],
  "size": 25
}

# Performance considerations:
# - Indices are sharded by guild_id for query locality
# - Time-based indices allow efficient deletion of old data
# - Opt-in only: not all guilds have search enabled
# - Index lag: ~30 seconds from message send to searchable

Distributed Rate Limiting

Discord's API handles millions of requests per second from users, bots, and third-party applications. Rate limiting is critical for preventing abuse and ensuring fair resource allocation:

# Discord Rate Limiting Strategy
#
# Multiple layers of rate limiting:
#
# 1. PER-ROUTE rate limits:
#    POST /channels/{id}/messages → 5 requests per 5 seconds per channel
#    PATCH /guilds/{id}           → 2 requests per 10 seconds
#    GET /users/@me               → 5 requests per 5 seconds
#
# 2. GLOBAL rate limit:
#    50 requests per second across ALL routes (per bot/user)
#
# 3. Resource-specific limits:
#    Reactions: 1 per 250ms per channel
#    DMs: 5 per second (anti-spam)
#    Guild creation: 10 per day
#
# Rate limit headers in every response:
# X-RateLimit-Limit: 5         (max requests in window)
# X-RateLimit-Remaining: 3     (requests left)
# X-RateLimit-Reset: 1640000.5 (Unix timestamp of reset)
# X-RateLimit-Bucket: abc123   (shared bucket identifier)
#
# When exceeded:
# HTTP 429 Too Many Requests
# {
#   "message": "You are being rate limited.",
#   "retry_after": 3.57,        (seconds to wait)
#   "global": false              (false = per-route, true = global)
# }

# Implementation: distributed token bucket using Redis
#
# EVALSHA  1 rate:user:123:POST:/channels/456/messages
#   5       -- max tokens
#   5000    -- refill interval (ms)
#   1       -- tokens to consume
#
# Redis Lua script (atomic):
local key = KEYS[1]
local max_tokens = tonumber(ARGV[1])
local interval_ms = tonumber(ARGV[2])
local consume = tonumber(ARGV[3])
local now = redis.call('TIME')
local now_ms = now[1] * 1000 + now[2] / 1000

local bucket = redis.call('HMGET', key, 'tokens', 'last_refill')
local tokens = tonumber(bucket[1]) or max_tokens
local last_refill = tonumber(bucket[2]) or now_ms

-- Refill tokens based on elapsed time
local elapsed = now_ms - last_refill
local refill = math.floor(elapsed / interval_ms) * max_tokens
tokens = math.min(max_tokens, tokens + refill)

if tokens >= consume then
  tokens = tokens - consume
  redis.call('HMSET', key, 'tokens', tokens, 'last_refill', now_ms)
  redis.call('PEXPIRE', key, interval_ms * 2)
  return {1, tokens, 0}  -- allowed, remaining, retry_after
else
  local retry_after = interval_ms - elapsed
  return {0, 0, retry_after}  -- denied, 0 remaining, retry ms
end

Full Architecture Overview

Let's trace a complete message through Discord's entire infrastructure stack:

# Complete message lifecycle: User sends "Hello!" in #general
#
# 1. CLIENT → API GATEWAY
#    WebSocket frame: {"op":0, "d":{"content":"Hello!",
#                      "channel_id":"789", "nonce":"abc"}}
#    Transport: WSS (WebSocket Secure over TLS 1.3)
#    Terminates at: Cloudflare → Discord API Gateway (Nginx)
#
# 2. API GATEWAY → RATE LIMITER
#    Check: POST /channels/789/messages rate limit
#    Redis EVALSHA → token bucket check → PASS
#    ~0.5ms
#
# 3. RATE LIMITER → API SERVICE
#    Validate authentication (JWT token verification)
#    Validate permissions (user can send in this channel?)
#    Validate content (message length, attachment limits)
#    ~2ms
#
# 4. API SERVICE → GUILD PROCESS (Elixir)
#    Route to the Elixir node that owns guild 456
#    guild_pid = Router.get_node(guild_id)
#    GenServer.cast(guild_pid, {:new_message, ...})
#    ~1ms (Erlang distribution protocol)
#
# 5. GUILD PROCESS → MESSAGE STORE (ScyllaDB)
#    INSERT INTO messages (channel_id, message_id, ...)
#    Async write, don't block the guild process
#    ~2-5ms write latency
#
# 6. GUILD PROCESS → FANOUT
#    For each member with access to #general:
#      For each of that member's connected sessions:
#        Send MESSAGE_CREATE event via WebSocket
#    Connected members in channel: ~500 (of 10K guild members)
#    Active sessions: ~800 (some have desktop + mobile)
#    Fanout time: ~5ms for 800 WebSocket sends
#
# 7. GUILD PROCESS → KAFKA (async)
#    Publish message event to discord.messages topic
#    For analytics, search indexing, abuse detection
#    ~1ms (fire and forget with acks=1)
#
# 8. KAFKA → ELASTICSEARCH (async, ~30s lag)
#    Consumer reads from discord.messages
#    Index into discord-messages-2026-04 index
#    Message becomes searchable
#
# 9. READ STATES UPDATE (Rust service)
#    For the author: update read state for this channel
#    For other members: increment unread/mention counts
#    ~0.5ms per update
#
# Total time from Send to Delivery: ~15-40ms
# (User perceives it as "instant")

Key Takeaways

Choose the right runtime for the job: Elixir/BEAM's lightweight processes, fault isolation, and built-in distribution make it ideal for real-time messaging. No other runtime provides all four properties simultaneously.
Guild-level sharding makes the system tractable — each guild is a single process with serialized state. No distributed locks, no race conditions, no coordination overhead for guild-local operations.
Database evolution is normal: MongoDB → Cassandra → ScyllaDB wasn't failure — it was engineering. Each migration was driven by hitting real scaling limits. Design for migration from day one (abstraction layers, CQL compatibility).
GC is a real production concern at scale. Both the Cassandra→ScyllaDB and Go→Rust migrations were driven by GC-induced tail latency. When p99 matters, GC-free runtimes win.
Lazy loading is essential for systems with massive cardinality differences (guilds ranging from 2 to 1M+ members). One-size-fits-all data transfer strategies break at the extremes.
SFUs over MCUs: For voice/video at scale, SFU architecture provides the best balance of server cost (forward, don't mix) and client flexibility (choose layout client-side).
Use the right language per service: Elixir for real-time messaging (concurrency), Rust for hot-path services (latency), Python for ML pipelines (ecosystem). Polyglot architectures let you optimize per-constraint.
Distributed rate limiting is table stakes for any API at scale. Redis-backed token buckets with Lua scripts provide atomic, low-latency rate limiting across a distributed fleet.