← All Posts
High Level Design Series · Foundations· Part 9 of 70

CDN & Edge Computing

What Is a CDN?

A Content Delivery Network (CDN) is a geographically distributed system of servers that delivers web content to users from the server nearest to them. The core insight is simple: the speed of light is fast, but the internet is slow. A packet travelling from New York to Singapore (~15,000 km) needs at least 50 ms one way just at light speed through fiber — and real-world latency is typically 3-4x worse due to routing hops, congestion, and protocol overhead.

Consider a user in Tokyo requesting an image from an origin server in Virginia:

CDNs solve three fundamental problems:

  1. Latency: Serve content from geographically close servers, reducing round-trip time from hundreds of milliseconds to single digits.
  2. Bandwidth: Distribute load across thousands of edge servers instead of concentrating it on a single origin. A viral video that gets 10M views in an hour would crush most origin servers — but CDN edge servers absorb the traffic.
  3. Reliability: If one PoP goes down, DNS automatically routes users to the next closest PoP. The origin can go offline temporarily and cached content continues to be served.
The numbers that matter: Amazon found that every 100ms of latency costs 1% in sales. Google found that 53% of mobile users abandon sites that take longer than 3 seconds to load. CDNs are not optional for global-scale applications — they are fundamental infrastructure.

Modern CDNs serve a staggering volume of traffic. Cloudflare handles over 57 million HTTP requests per second on average. Akamai delivers between 15-30% of all web traffic globally. Netflix's Open Connect CDN serves over 100 Tbps during peak hours.

How CDNs Work

The CDN request lifecycle involves several key components working together:

Core Architecture

Request Flow: Step by Step

  1. DNS Resolution: User requests cdn.example.com. The authoritative DNS for this domain returns a CNAME pointing to the CDN's domain (e.g., d1234.cloudfront.net). The CDN's DNS then uses Anycast routing or geo-DNS to resolve to the IP of the nearest PoP.
  2. TCP/TLS Connection: The user's browser establishes a TCP connection to the edge server. Since the PoP is nearby, the TCP handshake completes in ~5 ms instead of ~150 ms. TLS session resumption further reduces overhead on repeat connections.
  3. Cache Lookup: The edge server hashes the request URL + relevant headers (e.g., Accept-Encoding, Accept-Language) and checks its local cache. This is typically an in-memory lookup (microseconds) backed by SSD storage.
  4. Cache HIT: Content is found and fresh → served immediately. Response time: 1-10 ms.
  5. Cache MISS: Content not cached or stale → edge server fetches from origin (or a mid-tier cache). The content is then stored locally for future requests.

Anycast Routing

Most modern CDNs use BGP Anycast — the same IP address is announced from multiple PoPs worldwide. When a user connects to that IP, BGP routing naturally directs the connection to the nearest PoP (in terms of network hops, not necessarily geographic distance). This is elegant because:

# Simplified DNS resolution flow
$ dig cdn.example.com

;; ANSWER SECTION:
cdn.example.com.    300  IN  CNAME  d1234.cloudfront.net.
d1234.cloudfront.net. 60 IN  A      13.224.64.12    ← Anycast IP
d1234.cloudfront.net. 60 IN  A      13.224.64.44    ← Anycast IP

# The returned IPs resolve to the nearest PoP
# A user in London gets routed to London PoP
# A user in Tokyo gets routed to Tokyo PoP
# Same IPs, different physical destinations

▶ CDN Request Flow

Step through the CDN request lifecycle: DNS resolution, cache HIT, cache MISS, and origin fetch.

Push vs Pull CDN

CDNs use two fundamental strategies for populating their edge caches:

Pull CDN (Lazy / On-Demand)

The edge server fetches content from the origin only when a user requests it. This is by far the most common approach.

# Pull CDN: Origin server sends cache headers
HTTP/1.1 200 OK
Cache-Control: public, max-age=86400, s-maxage=604800
Content-Type: image/jpeg
ETag: "a1b2c3d4"
Content-Length: 245832

# max-age=86400     → Browser caches for 1 day
# s-maxage=604800   → CDN caches for 7 days
# ETag              → Enables conditional revalidation

Push CDN (Proactive / Pre-populated)

You explicitly upload content to the CDN before any user requests it. Think of it as a distributed file system you populate ahead of time.

# Push CDN: Upload via AWS CLI to S3 (CloudFront origin)
aws s3 sync ./dist s3://my-cdn-bucket/ \
  --cache-control "public, max-age=31536000, immutable" \
  --delete

# Or via Cloudflare API
curl -X PUT "https://api.cloudflare.com/client/v4/accounts/{id}/storage/kv/namespaces/{ns}/values/{key}" \
  -H "Authorization: Bearer {token}" \
  --data-binary @./asset.js

Comparison

AspectPull CDNPush CDN
Cache populationAutomatic on first requestManual upload required
First request latencyHigher (cache MISS → origin fetch)Low (already cached)
Storage efficiencyOnly caches requested contentAll content stored (may waste space)
Origin dependencyOrigin must be available for MISSesOrigin can be offline after push
Best forLarge catalogs, dynamic content, APIsStatic assets, known file sets, SPAs
Update complexityAutomatic via TTL/revalidationRequires explicit push + invalidation
Real-world examplesCloudflare, Fastly, most CDNs by defaultS3 + CloudFront, Netlify, Vercel
Hybrid approach: Most production systems use both. Static assets (JS, CSS, images) are pushed during deployment with long cache TTLs and content-hashed filenames (app.a1b2c3.js). Dynamic content (API responses, HTML pages) uses pull CDN with shorter TTLs.

Cache Hierarchy

Modern CDNs use a multi-tier cache hierarchy to maximize cache hit rates while minimizing origin load:

Three-Tier Architecture

  1. L1 — Edge Cache (PoP): Closest to the user. Each of the 200-300 PoPs has its own cache. Small, fast, high turnover. Cache hit ratio: 80-90% for popular content.
  2. L2 — Shield / Mid-Tier Cache: A designated "super PoP" that aggregates requests from multiple edge PoPs in a region. If an edge cache misses, it queries the shield before going to origin. This dramatically reduces origin load: instead of 200 PoPs hitting origin, only 5-10 shield nodes do. Cache hit ratio at shield: 95-99%.
  3. L3 — Origin Server: The authoritative source. Only receives requests that miss both edge and shield caches — typically 1-5% of total traffic.
# Request flow through cache hierarchy:
User → Edge PoP (Tokyo)
  ├── HIT  → Serve (5ms)
  └── MISS → Shield (Singapore)
              ├── HIT  → Serve + cache at Tokyo edge (25ms)
              └── MISS → Origin (Virginia)
                          └── Serve + cache at Shield + cache at Edge (180ms)

# Without shield: 200 PoPs each send cache-miss requests to origin
# With shield: Only 5-10 shield nodes send requests to origin
# Origin load reduction: ~95%

Cache-Control Headers

HTTP cache headers are the primary mechanism for controlling CDN caching behavior:

Header / DirectiveMeaningExample
max-age=NBrowser can cache for N secondsmax-age=3600 (1 hour)
s-maxage=NCDN/shared cache TTL (overrides max-age)s-maxage=86400 (1 day)
no-cacheMust revalidate with origin before servingForces ETag/Last-Modified check
no-storeNever cache (sensitive data)Banking, personal data
stale-while-revalidate=NServe stale content while fetching fresh copy in backgroundstale-while-revalidate=60
stale-if-error=NServe stale if origin is downstale-if-error=86400
immutableContent will never change (skip revalidation)Versioned assets: app.a3f2.js
privateOnly browser may cache (not CDN)User-specific responses

Conditional Requests & Revalidation

# Step 1: First request — origin includes ETag
HTTP/1.1 200 OK
Cache-Control: public, max-age=300, s-maxage=3600
ETag: "v42-a1b2c3"
Last-Modified: Wed, 15 Apr 2026 10:30:00 GMT
Content-Type: application/json

{"products": [...]}

# Step 2: After TTL expires, CDN revalidates
GET /api/products HTTP/1.1
If-None-Match: "v42-a1b2c3"
If-Modified-Since: Wed, 15 Apr 2026 10:30:00 GMT

# Step 3a: Content unchanged → 304 (no body transferred!)
HTTP/1.1 304 Not Modified
ETag: "v42-a1b2c3"
Cache-Control: public, max-age=300, s-maxage=3600

# Step 3b: Content changed → 200 with new body
HTTP/1.1 200 OK
ETag: "v43-d4e5f6"
Cache-Control: public, max-age=300, s-maxage=3600

{"products": [... updated data ...]}

The Vary Header

The Vary header tells the CDN to cache separate versions based on request header values:

# Cache different versions based on encoding and language
Vary: Accept-Encoding, Accept-Language

# This means the CDN stores separate cached copies for:
# Accept-Encoding: gzip  + Accept-Language: en-US  → Version A
# Accept-Encoding: br    + Accept-Language: en-US  → Version B
# Accept-Encoding: gzip  + Accept-Language: ja      → Version C

# WARNING: Vary on too many headers = cache fragmentation = low hit ratio
# NEVER: Vary: Cookie  (each user gets their own cached copy = no caching)

CDN Invalidation

Phil Karlton famously said, "There are only two hard things in Computer Science: cache invalidation and naming things." CDN invalidation at global scale is especially hard because you're invalidating caches across hundreds of PoPs worldwide simultaneously.

Invalidation Strategies

1. TTL-Based Expiration (Passive)

Content naturally expires after its TTL. Simple but imprecise — content remains stale until the TTL expires.

# Short TTL for frequently changing content
Cache-Control: public, s-maxage=60    # CDN caches for 1 minute

# Long TTL for stable content
Cache-Control: public, s-maxage=604800  # CDN caches for 7 days

2. Purge (Hard Invalidation)

Immediately remove content from all edge caches. The CDN sends purge commands to every PoP.

# Cloudflare: Purge specific URLs
curl -X POST "https://api.cloudflare.com/client/v4/zones/{zone_id}/purge_cache" \
  -H "Authorization: Bearer {token}" \
  -d '{"files": ["https://example.com/api/products", "https://example.com/images/hero.jpg"]}'

# Fastly: Instant purge via surrogate key
curl -X POST "https://api.fastly.com/service/{id}/purge/{surrogate-key}" \
  -H "Fastly-Key: {token}"

# AWS CloudFront: Create invalidation
aws cloudfront create-invalidation \
  --distribution-id E1234 \
  --paths "/api/products" "/images/*"

3. Soft Purge (Mark as Stale)

Instead of removing content, mark it as stale. The next request triggers a background revalidation while serving the stale content. This avoids the "thundering herd" problem where a hard purge causes all users to simultaneously hit origin.

# Fastly soft purge — marks content stale, serves stale while revalidating
curl -X POST "https://api.fastly.com/service/{id}/purge/{key}" \
  -H "Fastly-Key: {token}" \
  -H "Fastly-Soft-Purge: 1"

4. Versioned URLs (Best Practice)

The most reliable invalidation strategy: never invalidate — just change the URL. Content-hash filenames ensure that new deployments automatically bypass old cached versions.

# Build tool generates content-hashed filenames:
app.a1b2c3d4.js    ← Old version (cached for 1 year)
app.e5f6g7h8.js    ← New version (different URL = no cache conflict)

# index.html (short TTL) references the new filename:
<script src="/assets/app.e5f6g7h8.js"></script>

# Cache headers for versioned assets:
Cache-Control: public, max-age=31536000, immutable
# ↑ Cache for 1 year; content will never change at this URL
Best practice hierarchy: (1) Use versioned URLs for static assets — no invalidation needed. (2) Use stale-while-revalidate for API responses — users get instant responses while background refresh happens. (3) Use soft purge for urgent updates — avoids thundering herd. (4) Use hard purge as last resort — for security-critical content removal.

Propagation Delay

Even "instant" purges take time to propagate across all PoPs:

Edge Computing

Edge computing extends CDNs beyond static content caching — it runs custom application logic at the edge, closer to users. Instead of every request travelling to a central origin, computation happens at the nearest PoP.

Edge Compute Platforms

PlatformRuntimeCold StartMax ExecutionLocations
Cloudflare WorkersV8 Isolates (JS/WASM)0 ms (no cold start)30s CPU / 30s wall310+
Lambda@EdgeNode.js, Python50-200 ms5-30s220+ (CloudFront PoPs)
CloudFront FunctionsJavaScript (lightweight)<1 ms1ms CPU220+
Deno DeployV8 Isolates (JS/TS/WASM)0 ms50ms CPU / 2min wall35+
Vercel Edge FunctionsV8 Isolates (JS/TS)0 ms30s~30+

Use Cases

A/B Testing at the Edge

// Cloudflare Worker: A/B testing without origin involvement
export default {
  async fetch(request) {
    const cookie = request.headers.get('Cookie') || '';
    let variant = cookie.match(/ab_variant=(\w+)/)?.[1];

    if (!variant) {
      variant = Math.random() < 0.5 ? 'control' : 'experiment';
    }

    const url = new URL(request.url);
    url.pathname = `/${variant}${url.pathname}`;

    const response = await fetch(url.toString());
    const newResponse = new Response(response.body, response);
    newResponse.headers.set('Set-Cookie',
      `ab_variant=${variant}; Path=/; Max-Age=86400`);
    return newResponse;
  }
};

Geolocation-Based Routing

// Route users to region-specific content
export default {
  async fetch(request) {
    const country = request.headers.get('CF-IPCountry');
    const lang = { 'JP': 'ja', 'DE': 'de', 'FR': 'fr', 'BR': 'pt' }[country] || 'en';

    // Redirect to localized version
    if (lang !== 'en') {
      return Response.redirect(`https://example.com/${lang}${new URL(request.url).pathname}`, 302);
    }
    return fetch(request);
  }
};

Authentication & Token Validation

// Validate JWT at the edge — reject unauthorized requests before they hit origin
export default {
  async fetch(request) {
    const token = request.headers.get('Authorization')?.replace('Bearer ', '');
    if (!token) return new Response('Unauthorized', { status: 401 });

    try {
      const payload = await verifyJWT(token, JWT_SECRET);
      const newHeaders = new Headers(request.headers);
      newHeaders.set('X-User-Id', payload.sub);
      return fetch(new Request(request, { headers: newHeaders }));
    } catch (e) {
      return new Response('Invalid token', { status: 403 });
    }
  }
};

Image Optimization

// Dynamically resize and convert images at the edge
export default {
  async fetch(request) {
    const url = new URL(request.url);
    const width = parseInt(url.searchParams.get('w') || '800');
    const format = request.headers.get('Accept')?.includes('webp') ? 'webp' : 'jpeg';

    return fetch(request, {
      cf: {
        image: {
          width: Math.min(width, 2000),
          format: format,
          quality: 85,
          fit: 'cover'
        }
      }
    });
  }
};
When to use edge computing: Use it for request/response transformations that don't need persistent state — routing, auth checks, A/B testing, header manipulation, redirects. Don't use it for heavy computation or database writes — those still belong at the origin.

CDN for Dynamic Content

CDNs are no longer just for static files. Modern CDNs accelerate dynamic content through several techniques:

Edge Side Includes (ESI)

ESI lets you cache page fragments independently. A product page might be 90% static (layout, images, description) and 10% dynamic (price, stock, personalized recommendations). ESI caches the static parts and fetches only the dynamic fragments from origin.

<!-- Cached page template (TTL: 1 hour) -->
<html>
<body>
  <header>...static nav...</header>
  <main>
    <h1>Product XYZ</h1>
    <img src="/img/xyz.jpg" />

    <!-- Dynamic fragment: fetched from origin on each request -->
    <esi:include src="/api/price?sku=xyz" ttl="60" />

    <!-- Personalized fragment -->
    <esi:include src="/api/recommendations?user=${user_id}" />
  </main>
</body>
</html>

API Response Caching

Short-TTL caching of API responses at the edge can handle massive read loads:

# API endpoint with aggressive edge caching
# Even 5 seconds of caching absorbs massive traffic spikes

GET /api/products/trending HTTP/1.1

HTTP/1.1 200 OK
Cache-Control: public, s-maxage=5, stale-while-revalidate=30
Surrogate-Key: products trending
Vary: Accept-Encoding

# At 10,000 requests/second:
# Without CDN:  10,000 req/s hit origin
# With 5s TTL:  ~1 req/5s hits origin (0.002% of traffic)
# Origin load reduction: 99.998%

Dynamic Site Acceleration (DSA)

Even for truly uncacheable dynamic requests, CDNs improve performance through:

WebSocket & Real-Time Content

Modern CDNs support WebSocket proxying and real-time protocols:

# Cloudflare automatically proxies WebSocket connections
# User ↔ Edge PoP ↔ Origin (persistent WebSocket connection)

# Benefits:
# - TLS termination at the edge (faster handshake)
# - DDoS protection for WebSocket connections
# - Geographic load balancing across multiple origin servers
# - Connection pooling between PoP and origin

# Limitations:
# - Each WebSocket connection still routes to origin (not cached)
# - Edge can inspect/route but not cache real-time data
# - Added hop may increase latency by 1-5 ms

▶ Global CDN Map

See how users worldwide route to their nearest PoP. Step through to watch geographic routing in action.

CDN Providers

The CDN landscape is diverse, with providers optimized for different use cases:

ProviderPoPsEdge ComputePurge SpeedBest For
Cloudflare310+Workers (V8)<30 msAll-in-one (CDN + WAF + DDoS + edge compute). Generous free tier.
Akamai4,100+EdgeWorkers5-7sEnterprise, largest network. Media streaming at scale.
AWS CloudFront400+Lambda@Edge / CF Functions~60sAWS-native apps. Deep integration with S3, ALB, API Gateway.
Fastly90+Compute@Edge (WASM)~150 msReal-time purging, VCL configurability. API-heavy workloads.
Google Cloud CDN180+Cloud Run (regional)~secondsGCP-native apps. Tight integration with Cloud Load Balancing.
Choosing a CDN provider: For most startups and mid-size companies, Cloudflare is the default choice — generous free tier, excellent performance, and built-in edge compute. For AWS-heavy architectures, CloudFront is the natural fit. For enterprises needing the largest global network, Akamai remains the gold standard. For real-time purging and developer experience, Fastly excels.

CDN in System Design Interviews

CDN is one of the most frequently mentioned components in system design interviews. Here's how to incorporate it effectively:

When to Mention CDN

How to Draw CDN in Architecture Diagrams

                    ┌─────────────┐
   Users ──────────▶│     CDN     │
   (Global)         │ (Edge PoPs) │
                    └──────┬──────┘
                           │ Cache MISS only
                    ┌──────▼──────┐
                    │ Load Balancer│
                    └──────┬──────┘
                           │
              ┌────────────┼────────────┐
              ▼            ▼            ▼
        ┌──────────┐ ┌──────────┐ ┌──────────┐
        │ App Srv 1│ │ App Srv 2│ │ App Srv 3│
        └──────────┘ └──────────┘ └──────────┘

Place CDN as the FIRST layer between users and your infrastructure.
Show that only cache MISSes reach your backend.

Common Interview Follow-Up Questions

QuestionKey Points to Cover
"How do you handle cache invalidation?"Versioned URLs for assets, TTL + stale-while-revalidate for APIs, surrogate keys for targeted purging.
"What if the CDN serves stale data?"Acceptable for most read traffic (eventual consistency). Critical data (payments, auth) bypasses CDN.
"How does CDN handle personalized content?"ESI for partial caching, edge compute for personalization logic, Vary header for limited variants.
"CDN for write-heavy workloads?"CDN primarily helps reads. For writes, consider edge compute for validation, then forward to origin.
"What about CDN costs?"CDN bandwidth is cheaper than origin bandwidth. Typical: $0.01-0.08/GB at CDN vs $0.09-0.12/GB at cloud origin.

Sample Answer: "Design a News Feed"

# CDN-relevant portion of a news feed design:

1. Static assets (JS, CSS, images, video thumbnails)
   → Push CDN with immutable versioned URLs
   → Cache-Control: public, max-age=31536000, immutable

2. Feed API responses (per-user feed)
   → Cannot cache personalized feeds at CDN
   → BUT: Cache trending/popular stories at edge (s-maxage=30)
   → Cache user profile pictures at edge (s-maxage=3600)
   → Use edge compute for A/B testing feed algorithms

3. Real-time notifications
   → WebSocket through CDN for TLS termination + DDoS protection
   → No caching, but reduced handshake latency

4. Architecture:
   Users → CDN → [static assets served from edge, 95% of bandwidth]
   Users → CDN → API Gateway → Feed Service [only feed requests reach origin]