← All Posts
High Level Design Series · Real-World Designs· Part 64 of 70

Design: Email System

Problem Statement

Email is one of the oldest and most mission-critical internet services — over 4 billion people use email, and more than 300 billion emails are sent every day. Despite the rise of instant messaging, email remains the backbone of business communication, account verification, marketing, and official correspondence.

Design a scalable email system similar to Gmail, Outlook, or Yahoo Mail that can serve 1 billion users, handle both sending and receiving at massive throughput, store petabytes of messages and attachments, provide near-instant search, and filter spam with high accuracy.

Why email is uniquely challenging: Unlike most modern systems where you control both client and server, email is a federated protocol. Your system must interoperate with every other mail server on the internet using open standards (SMTP, IMAP, POP3) defined decades ago. You can't simply redesign the protocol — you must work within its constraints while building modern features on top.

Requirements

Functional Requirements

Non-Functional Requirements

Back-of-the-Envelope Estimates

MetricEstimate
Total users1 billion
Daily active users100 million
Emails sent/day1 billion (10 per active user)
Emails received/day4 billion (40 per active user)
Average email size (with metadata)~50 KB (body + headers)
Emails with attachments~20% → 800 million/day
Average attachment size~500 KB
Daily email storage~200 TB (bodies) + ~400 TB (attachments)
Peak inbound QPS~150K emails/second
Total stored data (multi-year)~2 exabytes

Email Protocols — Deep Dive

Email relies on three core protocols. Understanding them is essential because our system must implement or interface with all three.

SMTP — Simple Mail Transfer Protocol

SMTP (RFC 5321) is the protocol used to send emails between mail servers and from clients to servers. It operates on TCP port 25 (server-to-server), port 587 (client submission with authentication), or port 465 (implicit TLS).

SMTP Conversation — Step by Step

An SMTP transaction is a text-based conversation between a client (sender) and server (receiver). Here's the actual protocol exchange:

S: 220 mx.recipient.com ESMTP Postfix
C: EHLO mail.sender.com
S: 250-mx.recipient.com Hello mail.sender.com
S: 250-SIZE 35882577
S: 250-8BITMIME
S: 250-STARTTLS
S: 250-AUTH LOGIN PLAIN
S: 250 OK
C: STARTTLS
S: 220 2.0.0 Ready to start TLS
  (TLS handshake occurs)
C: EHLO mail.sender.com
S: 250-mx.recipient.com Hello mail.sender.com
S: 250 OK
C: MAIL FROM:<alice@sender.com> SIZE=1024
S: 250 2.1.0 Ok
C: RCPT TO:<bob@recipient.com>
S: 250 2.1.5 Ok
C: DATA
S: 354 End data with <CR><LF>.<CR><LF>
C: From: Alice <alice@sender.com>
C: To: Bob <bob@recipient.com>
C: Subject: Hello Bob!
C: Date: Mon, 14 Apr 2026 10:30:00 -0400
C: Message-ID: <abc123@sender.com>
C: MIME-Version: 1.0
C: Content-Type: text/plain; charset="UTF-8"
C:
C: Hey Bob, how's it going?
C: .
S: 250 2.0.0 Ok: queued as 4F3E2D1
C: QUIT
S: 221 2.0.0 Bye
Key SMTP concepts: MAIL FROM specifies the "envelope sender" (used for bounces), which can differ from the From: header visible to users. This distinction is what spammers exploit, and what SPF/DKIM/DMARC were designed to address.

SMTP Delivery Chain

An email doesn't go directly from sender to recipient. The delivery path is:

  1. MUA (Mail User Agent) — the user's email client (Gmail web, Outlook, Thunderbird)
  2. MSA (Mail Submission Agent) — accepts the email from the MUA, authenticates the sender, adds headers
  3. MTA (Mail Transfer Agent) — routes the email, performs DNS MX lookup, and relays to the recipient's MTA
  4. MDA (Mail Delivery Agent) — delivers the email into the recipient's mailbox
  5. MUA — the recipient's client fetches the email via IMAP/POP3

DNS MX Lookup

When the sender's MTA needs to deliver to bob@recipient.com, it queries DNS for the MX (Mail Exchanger) record of recipient.com:

$ dig MX recipient.com
recipient.com.  300  IN  MX  10  mx1.recipient.com.
recipient.com.  300  IN  MX  20  mx2.recipient.com.
recipient.com.  300  IN  MX  30  mx3.recipient.com.

The MTA tries the lowest-priority number first (mx1), falling back to mx2 and mx3 if it's unavailable. This provides built-in redundancy in the email protocol.

IMAP — Internet Message Access Protocol

IMAP (RFC 3501, port 993 for TLS) is used by clients to read and manage emails stored on the server. Key characteristics:

IMAP Session Example

C: A001 LOGIN bob@recipient.com secretpassword
S: A001 OK LOGIN completed
C: A002 SELECT INBOX
S: * 172 EXISTS
S: * 1 RECENT
S: * OK [UNSEEN 170]
S: A002 OK [READ-WRITE] SELECT completed
C: A003 FETCH 170:172 (FLAGS ENVELOPE BODYSTRUCTURE)
S: * 170 FETCH (FLAGS (\Seen) ENVELOPE ("Mon, 14 Apr 2026 10:30:00 -0400" "Hello Bob!" ...) ...)
S: * 171 FETCH (FLAGS () ENVELOPE ...)
S: * 172 FETCH (FLAGS (\Recent) ENVELOPE ...)
S: A003 OK FETCH completed
C: A004 FETCH 172 (BODY[TEXT])
S: * 172 FETCH (BODY[TEXT] {23}
S: Hey Bob, how's it going?)
S: A004 OK FETCH completed
C: A005 STORE 172 +FLAGS (\Seen)
S: A005 OK STORE completed

POP3 — Post Office Protocol v3

POP3 (RFC 1939, port 995 for TLS) is simpler than IMAP but more limited:

FeatureSMTPIMAPPOP3
PurposeSend / relayRead & manageDownload
Port (TLS)465 / 587993995
Server storageQueues, then relaysPersistentTemporary
Multi-deviceN/AYesNo
Push supportN/AIDLE commandPolling only

High-Level Architecture

Our email system has two fundamental data paths — the outbound path (sending) and the inbound path (receiving) — plus a read path for clients fetching their mailbox.

Outbound (Sending) Flow

  1. User composes email in the web/mobile client
  2. API server validates the request (auth, rate limiting, attachment size, recipient format)
  3. Email is persisted to the Sent folder in the message store
  4. Attachments are uploaded to object storage (S3) and replaced with references
  5. Email is placed on the outbound queue
  6. SMTP outbound workers dequeue the message, perform DNS MX lookup for the recipient domain
  7. Worker opens an SMTP connection to the recipient's mail server and delivers the email
  8. On success, the email is marked as delivered; on failure, it's retried with exponential backoff (up to 72 hours per RFC 5321)

Inbound (Receiving) Flow

  1. External mail server connects to our SMTP inbound servers (discovered via our MX records)
  2. Connection-level checks — rate limiting, IP reputation, TLS negotiation
  3. Envelope validation — verify the recipient exists in our system
  4. Email is accepted and placed on the inbound processing queue
  5. Spam & security pipeline:
    • SPF check — verify sender IP is authorized for the domain
    • DKIM check — verify the digital signature on the email headers
    • DMARC check — evaluate the domain's published DMARC policy
    • Virus scanning — check attachments for malware
    • ML spam classifier — score content for spam probability
  6. Based on the spam score, the email is delivered to Inbox, Spam, or rejected
  7. Email metadata and body are written to the message store
  8. Attachments are saved to object storage
  9. The email is indexed for full-text search
  10. A push notification is sent to the user's connected devices

Architecture Diagram

                        ┌───────────────────────────────────────────────┐
                        │              Web / Mobile Clients              │
                        └──────────┬────────────────────┬───────────────┘
                                   │ HTTPS              │ WebSocket
                                   ▼                    ▼
                        ┌─────────────────┐   ┌──────────────────┐
                        │   API Servers   │   │  Push/Notification│
                        │  (REST/gRPC)    │   │     Service       │
                        └───┬──────┬──────┘   └──────────────────┘
                            │      │
                   ┌────────┘      └────────┐
                   ▼                        ▼
         ┌──────────────────┐    ┌──────────────────┐
         │ Outbound Queue   │    │   Message Store   │◄────── Read Path
         │  (Kafka/SQS)     │    │  (Metadata + Body)│
         └────────┬─────────┘    └──────┬───────────┘
                  ▼                      │
         ┌──────────────────┐            │    ┌──────────────────┐
         │  SMTP Outbound   │            ├───►│ Attachment Store  │
         │    Workers        │            │    │    (S3/GCS)      │
         └────────┬─────────┘            │    └──────────────────┘
                  │                      │
           DNS MX lookup                 │    ┌──────────────────┐
                  │                      └───►│  Search Index    │
                  ▼                           │ (Elasticsearch)  │
         ┌──────────────────┐                 └──────────────────┘
         │ Recipient's Mail │
         │    Server         │
         └──────────────────┘

  ── INBOUND ──

         ┌──────────────────┐
         │ External Sender's│
         │   Mail Server    │
         └────────┬─────────┘
                  │ SMTP
                  ▼
         ┌──────────────────┐    ┌──────────────────┐
         │  SMTP Inbound    │───►│  Inbound Queue   │
         │    Servers        │    │  (Kafka/SQS)     │
         └──────────────────┘    └────────┬─────────┘
                                          ▼
                                 ┌──────────────────┐
                                 │  Spam & Security  │
                                 │    Pipeline       │
                                 │ SPF│DKIM│DMARC│ML │
                                 └────────┬─────────┘
                                          ▼
                                 ┌──────────────────┐
                                 │  Message Store    │───► Search Index
                                 │  + Attachment     │───► Push Notification
                                 │    Store          │
                                 └──────────────────┘

Email Delivery Pipeline

Follow an email from composition to delivery — through SMTP relay, DNS MX resolution, spam checks, and mailbox delivery.

▶ Email Delivery Pipeline

Trace an email from Alice to Bob through every hop in the delivery chain.

Email Authentication — SPF, DKIM, DMARC

The original SMTP protocol has no built-in authentication. Anyone can connect to a mail server and claim to be sending from any address. Three complementary standards were developed to close this gap.

SPF — Sender Policy Framework

SPF (RFC 7208) allows a domain owner to publish a DNS TXT record listing which IP addresses are authorized to send email on behalf of that domain.

How SPF Works

  1. Sender at alice@sender.com sends an email via SMTP from IP 203.0.113.50
  2. Recipient's server extracts the domain from the SMTP MAIL FROM envelope command: sender.com
  3. Recipient's server queries DNS for the SPF record of sender.com:
    sender.com.  IN  TXT  "v=spf1 ip4:203.0.113.0/24 include:_spf.google.com -all"
  4. The server checks if the sending IP (203.0.113.50) matches any authorized mechanism:
    • ip4:203.0.113.0/24 — matches! The IP is in the authorized range
  5. Result: SPF PASS. If the IP didn't match, the -all suffix means hard fail (reject).

SPF Record Syntax

v=spf1                        # Version
ip4:203.0.113.0/24            # Allow this IPv4 range
ip6:2001:db8::/32             # Allow this IPv6 range
include:_spf.google.com      # Include Google's SPF record (for Gmail sending)
include:sendgrid.net          # Include SendGrid's IPs
a                             # Allow the domain's A record IP
mx                            # Allow the domain's MX record IPs
-all                          # FAIL everything else (hard fail)
~all                          # SOFTFAIL (accept but mark suspicious)
?all                          # NEUTRAL (no assertion)
SPF limitation: SPF checks the envelope sender (MAIL FROM), not the From: header that users see. An attacker can still forge the visible From: header while passing SPF. That's why we need DKIM and DMARC.

DKIM — DomainKeys Identified Mail

DKIM (RFC 6376) uses public-key cryptography to digitally sign email headers and body, proving the message hasn't been tampered with and was sent by an authorized server.

How DKIM Works

  1. Key setup: The domain owner generates a public/private key pair. The public key is published as a DNS TXT record:
    selector1._domainkey.sender.com.  IN  TXT  "v=DKIM1; k=rsa; p=MIGfMA0GCSq..."
  2. Signing: The sending mail server uses the private key to create a hash of specified headers (From, To, Subject, Date, Message-ID) and the body, then signs the hash. The signature is added as a DKIM-Signature header:
    DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
      d=sender.com; s=selector1;
      h=from:to:subject:date:message-id;
      bh=47DEQpj8HBSa+/TImW+5JCeuQeRkm5NMpJWZG3hSuFU=;
      b=dzdVyOfAKCdLXdJOc9G2q8LoXSlEniSb... (signature)
  3. Verification: The recipient's server extracts d=sender.com and s=selector1 from the DKIM header, fetches the public key from DNS (selector1._domainkey.sender.com), and verifies the signature.
  4. If verification passes → DKIM PASS. This proves the email genuinely originated from sender.com's authorized infrastructure and wasn't modified in transit.

DMARC — Domain-based Message Authentication, Reporting & Conformance

DMARC (RFC 7489) ties SPF and DKIM together and tells receiving servers what to do when authentication fails. It also addresses the SPF limitation by requiring alignment between the domain in the visible From: header and the domains checked by SPF/DKIM.

DMARC DNS Record

_dmarc.sender.com.  IN  TXT  "v=DMARC1; p=reject; rua=mailto:dmarc-reports@sender.com;
  ruf=mailto:dmarc-forensics@sender.com; adkim=s; aspf=r; pct=100"

DMARC Policy Options

Policy (p=)ActionUse Case
noneMonitor only, deliver normallyInitial deployment, gathering data
quarantineSend to spam/junk folderTransitional enforcement
rejectReject the email outrightFull enforcement (recommended)

DMARC Alignment Check

DMARC requires that the domain in the From: header aligns with at least one of:

Alignment can be strict (adkim=s) — exact domain match — or relaxed (adkim=r) — organizational domain match (subdomains allowed).

DMARC Evaluation:
  From: header   → alice@sender.com         (domain: sender.com)
  SPF domain     → bounces.sender.com       (MAIL FROM domain)
  DKIM d= domain → sender.com

  SPF alignment (relaxed): sender.com ⊆ sender.com ✓ PASS
  DKIM alignment (strict): sender.com == sender.com ✓ PASS
  → At least one passes → DMARC PASS

Spam Filtering — Deep Dive

Approximately 45% of all email is spam. Our system must block spam while minimizing false positives (legitimate email classified as spam) — a false positive is far worse than a false negative in email.

Multi-Layer Spam Defense

Layer 1: Connection-Level Filtering

Layer 2: Protocol-Level Authentication

Layer 3: Content Analysis (Rule-Based)

Layer 4: Machine Learning Classification

Layer 5: User-Level Signals

The spam score: Each layer contributes a score. The final classification typically uses a weighted combination: score = w₁·SPF + w₂·DKIM + w₃·DMARC + w₄·IP_rep + w₅·content_ML + w₆·URL_risk + .... If the score exceeds the threshold (e.g., 5.0 on a SpamAssassin-like scale), the email goes to spam. Borderline emails (3.0–5.0) might be delivered with warnings.

▶ Spam Detection Pipeline

Watch how an incoming email is evaluated through SPF, DKIM, content analysis, and ML scoring.

Storage Architecture

Email storage is the most challenging component. We're storing petabytes of data that must be durable, quickly retrievable, and organized per user.

Data Model — What We Store

Each email is decomposed into three tiers:

Tier 1: Email Metadata (Hot Storage)

Stored in a distributed relational or wide-column database (e.g., Bigtable, Cassandra, or sharded MySQL). This is what's loaded when the user opens their inbox.

emails_metadata {
  email_id:       UUID (primary key)
  user_id:        UUID (partition key — all queries are scoped to a user)
  folder:         ENUM (inbox, sent, drafts, spam, trash, archive)
  labels:         SET<STRING> (user-defined labels, e.g., "work", "receipts")
  from_address:   STRING
  from_name:      STRING
  to_addresses:   LIST<STRING>
  cc_addresses:   LIST<STRING>
  subject:        STRING
  snippet:        STRING (first ~100 chars of body for preview)
  thread_id:      UUID (for conversation grouping)
  message_id:     STRING (RFC 5322 Message-ID header)
  in_reply_to:    STRING (Message-ID of parent email)
  references:     LIST<STRING> (full thread chain of Message-IDs)
  has_attachments: BOOLEAN
  attachment_count: INT
  total_size:     INT (bytes, including attachments)
  is_read:        BOOLEAN
  is_starred:     BOOLEAN
  is_draft:       BOOLEAN
  spam_score:     FLOAT
  received_at:    TIMESTAMP (sort key — DESC for inbox ordering)
  created_at:     TIMESTAMP
}

Tier 2: Email Body (Warm Storage)

The full email body (HTML + plaintext) is stored separately, fetched only when the user opens a specific email. Stored in a blob store or wide-column DB.

emails_body {
  email_id:        UUID (primary key, matches metadata)
  body_plain:      TEXT (plaintext version)
  body_html:       TEXT (HTML version)
  raw_headers:     TEXT (full RFC 5322 headers for debugging)
  raw_mime:        BLOB (original MIME message for compliance/legal)
}

Tier 3: Attachments (Cold/Object Storage)

Attachments are stored in object storage (S3, GCS) with content-addressable storage for deduplication.

attachments {
  attachment_id:   UUID
  email_id:        UUID (foreign key)
  user_id:         UUID
  filename:        STRING
  content_type:    STRING (MIME type, e.g., "application/pdf")
  size_bytes:      INT
  content_hash:    SHA-256 (for deduplication — same file shared across emails)
  storage_path:    STRING (S3 key: "attachments/{hash_prefix}/{content_hash}")
  uploaded_at:     TIMESTAMP
}
Deduplication saves massive storage: When the same file (a company logo, a shared document) is sent to 1000 recipients, we store it once in object storage (keyed by SHA-256 hash) and create 1000 metadata references. At our scale, this can save 30–40% of attachment storage.

Storage Tiering Strategy

TierDataStorage TechAccess PatternLatency
HotMetadata + recent snippetsBigtable / Cassandra + cacheEvery inbox load<10ms
WarmEmail bodiesBigtable / blob storeWhen email is opened<50ms
ColdAttachmentsS3 / GCSWhen attachment downloaded<200ms
ArchiveEmails >2 years oldS3 Glacier / cold storageRarely (compliance, legal)Minutes–hours

Why Not a Traditional File System?

Early email servers (Maildir, mbox format) stored emails as files on disk. This fails at scale because:

Folders, Labels & Threading

Folders vs. Labels

Traditional email uses folders (an email can be in exactly one folder). Gmail pioneered labels (an email can have multiple labels). Our system supports both models:

email_labels {
  email_id:   UUID
  user_id:    UUID
  label:      STRING   -- "inbox", "sent", "spam", "trash", "work", "important", etc.
  PRIMARY KEY (user_id, label, email_id)
}

-- An email in the inbox with labels "work" and "important":
-- Row 1: (email_id=abc, user_id=bob, label="inbox")
-- Row 2: (email_id=abc, user_id=bob, label="work")
-- Row 3: (email_id=abc, user_id=bob, label="important")

-- "Move to trash" = remove "inbox" label, add "trash" label
-- "Archive" = remove "inbox" label (email still accessible via search/labels)

System folders (Inbox, Sent, Drafts, Spam, Trash) are implemented as mandatory labels with special behavior (e.g., Trash auto-deletes after 30 days, Spam after 30 days).

Email Threading

Conversation threading groups related emails together. The mechanism is based on RFC 5322 headers:

-- Original email from Alice:
Message-ID: <msg001@sender.com>
Subject: Project Update

-- Bob's reply:
Message-ID: <msg002@recipient.com>
In-Reply-To: <msg001@sender.com>
References: <msg001@sender.com>
Subject: Re: Project Update

-- Alice's reply to Bob's reply:
Message-ID: <msg003@sender.com>
In-Reply-To: <msg002@recipient.com>
References: <msg001@sender.com> <msg002@recipient.com>
Subject: Re: Project Update

Thread ID Assignment Algorithm

  1. When a new email arrives, check if In-Reply-To or any References header matches an existing message_id in our database
  2. If a match is found → assign the same thread_id as the matched email
  3. If no match is found but the Subject (after stripping "Re:", "Fwd:") matches an email from the same participants within a recent time window → tentatively group into the same thread
  4. If no match at all → create a new thread_id
-- Thread query: get all emails in a conversation, sorted chronologically
SELECT email_id, from_address, subject, snippet, received_at
FROM   emails_metadata
WHERE  user_id = :user_id
AND    thread_id = :thread_id
ORDER BY received_at ASC;

Email search must be fast (<1 second), comprehensive (body, subject, from, to, attachments), and support complex queries ("from:alice has:attachment after:2026/01/01 budget report").

Search Architecture

We use Elasticsearch (or a similar inverted-index engine like Apache Solr or custom-built) as the search backend:

                    ┌──────────────┐
  New email ───────►│  Indexing     │──────► Elasticsearch Cluster
  (from inbound     │  Pipeline     │        (sharded by user_id)
   or sent path)    └──────────────┘
                                             ┌──────────────┐
  Search query ──► API Server ──────────────►│ Elasticsearch│
  "from:alice                                │   Query      │
   budget report"                            └──────┬───────┘
                                                    │
                                             Return email_ids
                                                    │
                                                    ▼
                                             Fetch metadata from
                                             Message Store

Search Document Schema

{
  "email_id": "abc-123",
  "user_id": "user-456",
  "from": "alice@sender.com",
  "from_name": "Alice Johnson",
  "to": ["bob@recipient.com"],
  "cc": ["carol@example.com"],
  "subject": "Q1 Budget Report",
  "body_text": "Hi Bob, please find the Q1 budget report attached...",
  "attachment_names": ["Q1_Budget_2026.xlsx"],
  "labels": ["inbox", "work", "important"],
  "is_read": false,
  "is_starred": true,
  "has_attachment": true,
  "received_at": "2026-04-14T10:30:00Z",
  "thread_id": "thread-789"
}

Query DSL Examples

// User types: from:alice has:attachment after:2026/01/01 budget
// Translated to Elasticsearch query:
{
  "bool": {
    "must": [
      { "term": { "user_id": "user-456" } },
      { "match": { "from": "alice" } },
      { "term": { "has_attachment": true } },
      { "range": { "received_at": { "gte": "2026-01-01" } } },
      { "multi_match": {
          "query": "budget",
          "fields": ["subject^3", "body_text", "attachment_names^2"]
      }}
    ]
  }
}

Search Indexing Strategy

Push Notifications

Users expect near-instant notification when new email arrives. We support three mechanisms:

WebSocket (Web Clients)

The web client maintains a persistent WebSocket connection to our notification service. When a new email is delivered to a user's mailbox:

  1. The inbound pipeline publishes an event to a user-partitioned Kafka topic: new_email:{user_id}
  2. The notification service consumes the event
  3. It looks up all active WebSocket connections for that user_id
  4. It pushes a lightweight notification payload:
    {
      "type": "new_email",
      "email_id": "abc-123",
      "from": "alice@sender.com",
      "subject": "Q1 Budget Report",
      "snippet": "Hi Bob, please find the Q1...",
      "received_at": "2026-04-14T10:30:00Z"
    }
  5. The client updates the inbox counter and optionally shows a desktop notification

IMAP IDLE (Desktop Clients)

Traditional IMAP clients (Thunderbird, Apple Mail) use the IDLE command to keep a long-lived connection. When new mail arrives, the server pushes an EXISTS notification:

C: A010 IDLE
S: + idling
  ... (minutes pass) ...
S: * 173 EXISTS
S: * 1 RECENT
C: DONE
S: A010 OK IDLE terminated
C: A011 FETCH 173 (FLAGS ENVELOPE BODYSTRUCTURE)

Mobile Push (APNs / FCM)

For mobile devices not actively connected, we use Apple Push Notification Service (APNs) and Firebase Cloud Messaging (FCM). The notification service sends a push payload containing the sender and subject, triggering the phone's notification system.

Scaling & Partitioning

Inbox Partitioning — Shard by user_id

The single most important design decision: all data for a user lives on the same partition. This means every inbox query, folder listing, and search is a single-partition query — no scatter-gather needed.

partition_key = hash(user_id) % num_partitions

-- All these queries hit a SINGLE partition:
SELECT * FROM emails_metadata WHERE user_id = :uid AND folder = 'inbox' ORDER BY received_at DESC LIMIT 50;
SELECT * FROM emails_metadata WHERE user_id = :uid AND thread_id = :tid;
SELECT * FROM email_labels WHERE user_id = :uid AND label = 'work';

Why user_id Partitioning Works

SMTP Server Scaling

ComponentScaling StrategyInstance Count (est.)
SMTP InboundHorizontal behind L4 LB; MX records point to multiple IPs~5,000 servers
SMTP OutboundWorker pool consuming from outbound queue~3,000 workers
API ServersStateless, horizontal behind L7 LB~10,000 servers
Message StoreSharded by user_id (consistent hashing)~50,000 nodes
Search (Elasticsearch)Sharded by user_id, replicated for reads~20,000 nodes
Attachment Store (S3)Managed object storage, unlimited scaleN/A (managed)

Outbound SMTP: IP Reputation Management

When sending at scale, IP reputation is critical. If any of our sending IPs get blacklisted, millions of emails will bounce. Strategies:

Handling Email Bounces

Bounces fall into two categories:

Reliability & Durability

Zero Email Loss Guarantee

Losing an email is unacceptable. Our durability strategy:

  1. SMTP accept-then-persist — the SMTP inbound server does NOT return 250 OK until the email is durably written to the inbound queue (Kafka with acks=all, replication factor 3)
  2. Queue-to-store atomicity — the processing pipeline writes the email to the message store before acknowledging the queue offset. If the processor crashes, the message is redelivered (at-least-once semantics)
  3. Idempotent writes — emails are identified by Message-ID header. Duplicate deliveries (from retries) are detected and deduplicated
  4. Cross-region replication — the message store replicates synchronously to at least 2 data centers. A regional outage doesn't lose data
  5. Attachment durability — S3 provides 99.999999999% (11 nines) durability out of the box

Handling Server Failures

Scenario: SMTP inbound server crashes mid-delivery

1. External sender's MTA connects to our MX server (mx1)
2. mx1 accepts the email, writes to Kafka → 250 OK sent
3. mx1 crashes
4. The email is safe in Kafka (replicated to other brokers)
5. Another processing worker picks up the message and delivers to mailbox
6. If mx1 crashed BEFORE writing to Kafka:
   → The sender's MTA never received 250 OK
   → Per SMTP protocol, the sender's MTA will RETRY (to mx2 or mx3)
   → No email is lost

Advanced Features

Email Scheduling

Users can schedule emails to be sent at a future time. Implementation: save as a draft with a scheduled_send_at timestamp. A scheduled-sender cron service polls for emails where scheduled_send_at <= now() and moves them to the outbound queue.

Undo Send

Gmail's "Undo Send" works by simply delaying the actual send by 5–30 seconds. The email sits in a short-lived buffer. If the user clicks "Undo," it's removed from the buffer and placed back in drafts. If the timer expires, it's moved to the outbound queue.

Email Forwarding & Aliases

Users can configure forwarding rules. When an email arrives for a forwarding address, the inbound pipeline re-enqueues it to the outbound queue with the forwarding destination. Aliases are handled at the routing layer — multiple addresses map to the same user_id.

Auto-Categorization (Tabs)

Gmail-style categories (Primary, Social, Promotions, Updates) are implemented as ML classifiers that run during the inbound pipeline, assigning category labels based on:

Summary

ComponentTechnology / ApproachKey Design Decision
SendingSMTP outbound workers + DNS MXIP reputation pools, exponential retry
ReceivingSMTP inbound + Kafka queueAccept-then-persist for zero loss
AuthenticationSPF + DKIM + DMARCAll three required for strong anti-spoofing
Spam filtering5-layer pipeline (IP → auth → rules → ML → user signals)Weighted score aggregation
Metadata storageBigtable / Cassandra, sharded by user_idSingle-partition inbox queries
Body storageSeparate blob storeLazy-load on email open
AttachmentsS3 with content-addressable dedupSHA-256 deduplication
SearchElasticsearch, sharded by user_idNear-real-time indexing
ThreadingIn-Reply-To + References headersRFC 5322 standard threading
NotificationsWebSocket + IMAP IDLE + APNs/FCMReal-time push for all clients