High Level Design Series · Foundations · Part 4

Caching Strategies

April 2026 · 22 min read

What is Caching?

Caching is the technique of storing copies of frequently accessed data in a fast, temporary storage layer so that future requests for that data can be served more quickly than fetching it from the original, slower source. The cache sits closer to the consumer — in memory instead of on disk, on the same machine instead of across a network — and trades storage space for speed.

Every caching system revolves around three fundamental concepts:

Cache Hit: The requested data is found in the cache. The response is served directly from the cache — fast, cheap, no downstream load.
Cache Miss: The data is not in the cache. The system must fetch it from the original data source (database, API, disk), which is slower and more expensive.
Hit Ratio: The percentage of requests that result in cache hits. A hit ratio of 95% means 95 out of 100 requests are served from cache. This is the single most important metric for cache effectiveness.

Hit Ratio = Cache Hits / (Cache Hits + Cache Misses)

Example:
  Total requests  = 1,000,000
  Cache hits      = 950,000
  Cache misses    =  50,000
  Hit ratio       = 95%

Effective latency = (0.95 × 1ms) + (0.05 × 50ms)
                  = 0.95ms + 2.5ms
                  = 3.45ms   (vs 50ms without cache → ~14x faster)

The latency difference is dramatic. A typical in-memory cache like Redis serves reads in sub-millisecond time, while a database query might take 5–100ms depending on indexes, joins, and load. At scale, this difference is the difference between a responsive application and an overwhelmed one.

Where is Caching Applied?

Caching exists at every layer of a modern system stack:

Browser Cache (HTTP Cache-Control, ETag)

↓

CDN Cache (CloudFront, Cloudflare, Akamai)

↓

Load Balancer / Reverse Proxy (Nginx, Varnish)

↓

Application Cache (Redis, Memcached, in-process)

↓

Database Query Cache / Buffer Pool

↓

OS Page Cache / Disk Cache

↓

CPU L1/L2/L3 Cache

At the system design level, we primarily focus on the application cache layer (Redis, Memcached) and the CDN / HTTP cache layer. These are the layers you explicitly design and control. CPU and OS caches are managed by hardware and operating systems, but understanding them helps you write cache-friendly code.

▶ Cache Hit / Miss Flow

Watch how a cache hit returns data instantly, while a miss requires a round-trip to the database.

Caching Strategies

How data flows between your application, the cache, and the database defines the caching strategy. Each strategy makes different trade-offs between consistency, performance, and complexity. Understanding these trade-offs is essential for system design interviews.

Cache-Aside (Lazy Loading)

Cache-aside is the most widely used caching pattern. The application code is responsible for all cache interactions — the cache is a passive store, and the application orchestrates reads and writes.

Read path:

Application receives a read request.
Application checks the cache for the data.
If cache hit: return data from cache.
If cache miss: query the database, write the result to the cache, then return the data.

Write path:

Application writes data directly to the database.
Application invalidates (deletes) the corresponding cache entry.

# Cache-Aside pattern in Python (with Redis)
import redis
import json

r = redis.Redis(host='localhost', port=6379, db=0)

def get_user(user_id):
    # Step 1: Check cache
    cache_key = f"user:{user_id}"
    cached = r.get(cache_key)

    if cached:
        # Cache HIT
        return json.loads(cached)

    # Cache MISS: fetch from database
    user = db.query("SELECT * FROM users WHERE id = %s", user_id)

    if user:
        # Populate cache with TTL of 5 minutes
        r.setex(cache_key, 300, json.dumps(user))

    return user

def update_user(user_id, data):
    # Write to database
    db.execute("UPDATE users SET ... WHERE id = %s", user_id, data)

    # Invalidate cache (delete, don't update)
    r.delete(f"user:{user_id}")

Why delete instead of update the cache on writes? Deleting is safer. If you update the cache, you might have a race condition where two concurrent writes produce stale data in the cache. Deleting forces the next read to fetch fresh data from the DB. This is called the "cache invalidation on write" pattern.

Pros:

Only requested data is cached (no wasted memory on unused data)
Cache failures are non-fatal — the app falls back to the database
Works with any database; the cache is decoupled from the data source
Simple to implement and reason about

Cons:

First request for each key always results in a cache miss (cold start penalty)
Data can become stale if the database is updated by another service that doesn't invalidate the cache
Application code is responsible for all cache logic — more code to maintain

Read-Through

In read-through, the cache itself is responsible for loading data from the database on a cache miss. The application always reads from the cache; it never directly queries the database for cached data.

Application reads from cache.
If cache hit: return data.
If cache miss: the cache library/provider automatically loads data from the database, stores it, and returns it.

// Read-Through with a cache provider (conceptual Java)
@Cacheable(value = "users", key = "#userId")
public User getUser(String userId) {
    // This method is ONLY called on cache miss.
    // The cache framework handles checking the cache first.
    return userRepository.findById(userId);
}

// Spring Cache with Redis as read-through provider
@Configuration
@EnableCaching
public class CacheConfig {
    @Bean
    public RedisCacheManager cacheManager(RedisConnectionFactory cf) {
        RedisCacheConfiguration config = RedisCacheConfiguration
            .defaultCacheConfig()
            .entryTtl(Duration.ofMinutes(10))
            .serializeValuesWith(
                SerializationPair.fromSerializer(new GenericJackson2JsonRedisSerializer())
            );
        return RedisCacheManager.builder(cf)
            .cacheDefaults(config)
            .build();
    }
}

Pros: Application code is simpler — no cache logic to write. The cache framework handles all the plumbing.

Cons: The cache must understand how to load data (needs a data loader configured). Less flexible if different keys require different loading logic.

Write-Through

Write-through ensures that every write goes to both the cache and the database synchronously. The cache acts as the primary write interface, and it persists the data to the database before confirming the write.

Application writes data to the cache.
Cache synchronously writes the data to the database.
Write is confirmed to the application only after both are successful.

# Write-Through pattern (conceptual)
def write_through(key, value):
    # Write to cache
    cache.set(key, value)

    # Synchronously write to database
    db.execute("INSERT INTO data (key, value) VALUES (%s, %s) "
               "ON CONFLICT (key) DO UPDATE SET value = %s",
               key, value, value)

    # Only now is the write considered complete

Pros: Cache data is always consistent with the database. No stale reads.

Cons: Every write incurs the latency of both cache and database writes. Higher write latency. Data that's written but never read wastes cache space.

Write-Behind (Write-Back)

Write-behind improves on write-through by making the database write asynchronous. The application writes to the cache, which immediately confirms the write. The cache then asynchronously flushes the data to the database, often in batches.

Application writes data to the cache.
Cache immediately confirms the write (fast!).
Cache asynchronously writes to the database (batched, delayed).

# Write-Behind pattern (conceptual)
import threading
import queue

write_queue = queue.Queue()

def write_behind(key, value):
    # Write to cache immediately
    cache.set(key, value)
    # Queue the DB write
    write_queue.put((key, value))
    # Return immediately — write is "done" from client's perspective

def flush_worker():
    """Background thread that batches writes to DB."""
    batch = []
    while True:
        try:
            item = write_queue.get(timeout=1)
            batch.append(item)
            if len(batch) >= 100:
                db.batch_upsert(batch)
                batch.clear()
        except queue.Empty:
            if batch:
                db.batch_upsert(batch)
                batch.clear()

# Start background flusher
threading.Thread(target=flush_worker, daemon=True).start()

Pros: Extremely fast writes. Batching reduces database load significantly. Great for write-heavy workloads.

Cons: Risk of data loss if the cache crashes before flushing to the database. Data in the database can be stale for the duration of the async delay. More complex to implement correctly.

Write-Around

Write-around skips the cache entirely on writes. Data is written directly to the database. The cache is only populated on reads (via cache-aside or read-through).

# Write-Around pattern
def write_around(key, value):
    # Write directly to DB, bypass cache
    db.execute("INSERT INTO data (key, value) VALUES (%s, %s) "
               "ON CONFLICT (key) DO UPDATE SET value = %s",
               key, value, value)
    # Optionally invalidate stale cache entry
    cache.delete(key)

Pros: Good for data that's written once and rarely read. Doesn't pollute the cache with data that might never be accessed.

Cons: Reads after a write will always be a cache miss (until lazy-loaded). Higher read latency for recently-written data.

Strategy Comparison

Strategy	Read Perf.	Write Perf.	Consistency	Complexity	Best For
Cache-Aside	Fast (after first miss)	Normal (DB direct)	Eventual	Low	General purpose, read-heavy
Read-Through	Fast (after first miss)	Normal (DB direct)	Eventual	Medium	Frameworks with cache providers
Write-Through	Fast (always in cache)	Slow (sync both)	Strong	Medium	Read-heavy, consistency critical
Write-Behind	Fast	Very Fast	Eventual (risk of loss)	High	Write-heavy, tolerates loss
Write-Around	Slow (after write)	Normal (DB direct)	Strong (DB)	Low	Write-once, read-rarely

Interview tip: In a system design interview, always state which caching strategy you're using and why. For most systems, cache-aside is the default choice. Use write-through when consistency matters. Use write-behind when you need to absorb write spikes (e.g., analytics, logging). Mention trade-offs.

Eviction Policies

Caches have finite memory. When the cache is full and a new entry needs to be added, the cache must evict an existing entry. The eviction policy determines which entry gets removed. The right policy can mean the difference between a 95% hit ratio and a 70% hit ratio.

LRU (Least Recently Used)

LRU evicts the item that hasn't been accessed for the longest time. It's the most commonly used eviction policy because it works well with temporal locality — data accessed recently is likely to be accessed again soon.

Implementation: A doubly-linked list combined with a hash map. The hash map provides O(1) lookup by key. The linked list maintains access order — the most recently accessed item is at the head, the least recently accessed is at the tail. On eviction, remove the tail.

class LRUCache:
    def __init__(self, capacity: int):
        self.capacity = capacity
        self.cache = {}          # key → Node
        self.head = Node(0, 0)   # dummy head
        self.tail = Node(0, 0)   # dummy tail
        self.head.next = self.tail
        self.tail.prev = self.head

    def get(self, key: int) -> int:
        if key in self.cache:
            node = self.cache[key]
            self._remove(node)
            self._add_to_head(node)
            return node.value
        return -1  # cache miss

    def put(self, key: int, value: int):
        if key in self.cache:
            self._remove(self.cache[key])
        node = Node(key, value)
        self._add_to_head(node)
        self.cache[key] = node
        if len(self.cache) > self.capacity:
            # Evict LRU item (tail)
            lru = self.tail.prev
            self._remove(lru)
            del self.cache[lru.key]

    def _add_to_head(self, node):
        node.next = self.head.next
        node.prev = self.head
        self.head.next.prev = node
        self.head.next = node

    def _remove(self, node):
        node.prev.next = node.next
        node.next.prev = node.prev

class Node:
    def __init__(self, key, value):
        self.key, self.value = key, value
        self.prev = self.next = None

Time Complexity: O(1) for both get and put. This is why LRU is so popular — it provides optimal performance for cache operations.

▶ LRU Cache Eviction

Watch items get added to a 4-slot cache. When full, the least recently used item is evicted.

LFU (Least Frequently Used)

LFU evicts the item with the lowest access frequency. Each cache entry has a counter that increments on every access. When eviction is needed, the entry with the smallest counter is removed.

# Redis supports LFU natively
# redis.conf:
maxmemory-policy allkeys-lfu

# LFU uses a logarithmic counter (Morris counter) with decay:
# - Counter incremented probabilistically (log scale)
# - Counter decayed over time to handle changing access patterns
# - lfu-log-factor 10     (higher = slower counter growth)
# - lfu-decay-time 1      (minutes before counter halves)

Use case: LFU is better than LRU when some items are consistently popular. A viral post that gets accessed millions of times should stay cached even if it wasn't accessed in the last few seconds.

Drawback: New items start with a low frequency count and may be evicted immediately. Items that were popular in the past but no longer relevant can "stick" in cache.

FIFO (First In, First Out)

FIFO evicts the oldest entry regardless of how often or recently it was accessed. Simple to implement with a queue.

Use case: When all items have roughly equal access probability and simplicity matters more than optimality.

Random Eviction

Random eviction picks a random entry to evict. Surprisingly, random eviction performs within 10–20% of LRU for many real-world workloads, and it requires zero bookkeeping overhead.

# Redis random eviction
maxmemory-policy allkeys-random

Use case: High-throughput systems where the overhead of maintaining LRU linked lists is too expensive. CPU caches often use pseudo-random policies for this reason.

TTL-Based Expiration

TTL (Time-To-Live) expiration removes entries after a fixed duration, regardless of access patterns. TTL is not strictly an eviction policy — it's an expiration mechanism that works alongside eviction policies.

# Redis TTL commands
SET user:1001 '{"name":"Alice"}' EX 300      # expires in 300 seconds
SET session:abc '...' PX 1800000              # expires in 1,800,000 ms (30 min)
EXPIRE user:1001 600                          # reset TTL to 600 seconds
TTL user:1001                                 # check remaining TTL
PERSIST user:1001                             # remove TTL (key lives forever)

# Redis uses two strategies for TTL cleanup:
# 1. Lazy deletion: check TTL on access, delete if expired
# 2. Active expiration: background process samples keys and deletes expired ones

Choosing a TTL: Too short → frequent cache misses, defeating the purpose. Too long → stale data. A good starting point is 5–15 minutes for user-facing data and 1–24 hours for relatively static data (product catalogs, configurations). Always measure and tune based on your hit ratio and staleness tolerance.

Distributed Caching

A single cache server has limited memory and is a single point of failure. In production systems serving millions of users, you need a distributed cache — a cache spread across multiple nodes that collectively provide higher capacity, fault tolerance, and throughput.

Redis

Redis (Remote Dictionary Server) is the most popular distributed cache. It's an in-memory data structure store that supports far more than simple key-value operations.

# Basic Redis operations
SET user:1001 '{"name":"Alice","age":30}'     # set a key
GET user:1001                                  # get a key
DEL user:1001                                  # delete
MGET user:1001 user:1002 user:1003            # batch get (pipeline)
SETNX lock:order:5001 "worker-1"              # set if not exists (for locks)

# Data structure operations
HSET user:1001 name "Alice" age 30 city "NYC"  # hash (object)
HGET user:1001 name                            # → "Alice"
HGETALL user:1001                              # → all fields

LPUSH queue:tasks "task-1" "task-2"            # list (queue)
RPOP queue:tasks                               # → "task-1"

SADD tags:post:101 "redis" "caching" "hld"     # set
SMEMBERS tags:post:101                          # → all members
SINTER tags:post:101 tags:post:102             # intersection

ZADD leaderboard 1500 "alice" 2300 "bob"       # sorted set
ZRANGEBYSCORE leaderboard 1000 2000            # range query
ZRANK leaderboard "bob"                         # → rank (0-indexed)

# Pub/Sub
SUBSCRIBE channel:notifications
PUBLISH channel:notifications "New order received"

# Streams (append-only log, like Kafka)
XADD orders * product "laptop" qty 1
XREAD COUNT 10 STREAMS orders 0

Memcached

Memcached is a simpler, faster alternative to Redis for pure key-value caching. It's multi-threaded (Redis is mostly single-threaded), making it more efficient on multi-core machines for simple get/set workloads.

# Memcached operations (telnet-style protocol)
set user:1001 0 300 27        # set with 300s TTL
{"name":"Alice","age":30}

get user:1001                  # retrieve
delete user:1001               # remove

# Memcached via Python
import pymemcache
client = pymemcache.Client(('localhost', 11211))
client.set('user:1001', json.dumps(user), expire=300)
result = client.get('user:1001')

Redis vs Memcached

Feature	Redis	Memcached
Data Structures	Strings, Lists, Sets, Sorted Sets, Hashes, Streams, HyperLogLog, Bitmaps	Strings only (key-value)
Threading	Single-threaded event loop (I/O threads in Redis 6+)	Multi-threaded
Persistence	RDB snapshots + AOF log	None (pure in-memory)
Replication	Master-replica with automatic failover (Sentinel/Cluster)	No built-in replication
Max Value Size	512 MB	1 MB (default)
Pub/Sub	Yes	No
Lua Scripting	Yes (atomic operations)	No
Use Case	Complex caching, sessions, queues, leaderboards, rate limiting	Simple, high-throughput key-value caching

Rule of thumb: If you need only simple key-value caching and maximum throughput, Memcached is slightly faster. For everything else — data structures, persistence, pub/sub, scripting — use Redis. In practice, Redis wins most of the time because its versatility outweighs Memcached's marginal speed advantage.

Consistent Hashing for Cache Distribution

When you have N cache servers, you need a way to decide which server holds which key. Naive modular hashing (server = hash(key) % N) breaks catastrophically when servers are added or removed — almost every key remaps, causing a cache avalanche.

Consistent hashing solves this by mapping both servers and keys onto a virtual ring (0 to 2³²). Each key maps to the next server clockwise on the ring. When a server is added or removed, only K/N keys need to be remapped (where K is total keys and N is total servers), not all of them.

# Consistent hashing with virtual nodes
import hashlib

class ConsistentHash:
    def __init__(self, nodes, virtual_nodes=150):
        self.ring = {}
        self.sorted_keys = []
        for node in nodes:
            for i in range(virtual_nodes):
                key = self._hash(f"{node}:{i}")
                self.ring[key] = node
                self.sorted_keys.append(key)
        self.sorted_keys.sort()

    def _hash(self, key):
        return int(hashlib.md5(key.encode()).hexdigest(), 16)

    def get_node(self, key):
        h = self._hash(key)
        # Find the first node clockwise from the hash
        for ring_key in self.sorted_keys:
            if h <= ring_key:
                return self.ring[ring_key]
        return self.ring[self.sorted_keys[0]]  # wrap around

# Usage
ch = ConsistentHash(["redis-1", "redis-2", "redis-3"])
server = ch.get_node("user:1001")  # → "redis-2"

Virtual nodes (also called vnodes) ensure even distribution. Without them, servers can end up with very unequal ranges on the ring. With 150 virtual nodes per physical server, the load is typically balanced within 5–10%.

Cache Invalidation

"There are only two hard things in Computer Science: cache invalidation and naming things." — Phil Karlton

Cache invalidation is the process of removing or updating stale data from the cache. Getting it wrong means users see outdated data; getting it right is genuinely hard in distributed systems.

TTL-Based Invalidation

The simplest approach: every cache entry has a TTL. After expiration, the entry is automatically removed and the next read fetches fresh data.

# Redis TTL-based invalidation
SET product:5001 '{"name":"Widget","price":29.99}' EX 600  # 10 min TTL

# The trade-off is staleness window:
# - TTL 60s   → max 60s stale, but higher miss rate
# - TTL 3600s → up to 1 hour stale, but great hit ratio
# - Choose based on your tolerance for stale data

Best for: Data that changes infrequently or where short-lived staleness is acceptable (product catalogs, user profiles, config data).

Event-Based Invalidation (Pub/Sub)

When data changes, publish an invalidation event. All cache nodes subscribe and evict the relevant entries immediately.

# Event-based invalidation with Redis Pub/Sub

# Publisher (when data changes):
def update_product(product_id, data):
    db.update_product(product_id, data)
    redis.publish("cache:invalidate", f"product:{product_id}")

# Subscriber (cache invalidation listener):
def listen_for_invalidations():
    pubsub = redis.pubsub()
    pubsub.subscribe("cache:invalidate")
    for message in pubsub.listen():
        if message['type'] == 'message':
            key = message['data']
            redis.delete(key)
            print(f"Invalidated cache key: {key}")

# In a microservices architecture, you might use Kafka instead:
# Order Service → publishes "order.updated" event
# Cache Service → consumes event, invalidates relevant cache keys

Best for: Multi-service architectures where one service writes and many services cache the data. Ensures near-real-time invalidation.

Version-Based Invalidation

Embed a version number in the cache key. When data changes, increment the version. Old versions naturally expire via TTL.

# Version-based invalidation
# Store current version in a fast lookup
SET product:5001:version 7

# Cache key includes the version
SET product:5001:v7 '{"name":"Widget","price":29.99}' EX 3600

# On read:
version = GET product:5001:version        # → 7
data = GET product:5001:v{version}        # → cached data

# On write:
INCR product:5001:version                 # → 8
# Old key "product:5001:v7" will expire via TTL
# Next read will miss on "product:5001:v8" and fetch from DB

Best for: Immutable cache entries (static assets, rendered pages). Avoids explicit deletion — just change the version.

Choosing an Invalidation Strategy

Strategy	Staleness	Complexity	Best For
TTL	Up to TTL duration	Very Low	General purpose, single-service
Event-Based	Near real-time	High	Multi-service, event-driven architectures
Version-Based	None (new version)	Medium	Immutable content, static assets
Active (on write)	None	Medium	Single-service, strong consistency

Cache Problems

Caching introduces a set of failure modes that don't exist without it. These problems frequently appear in system design interviews, and understanding their solutions is essential.

Cache Stampede

A cache stampede (also called "dog-piling") occurs when a popular cache key expires and many requests simultaneously attempt to rebuild it. Instead of one request fetching from the DB and populating the cache, hundreds of requests all independently query the database.

# The problem: 500 concurrent requests for the same expired key
# All 500 hit the DB simultaneously

# Solution 1: Mutex Lock (Singleflight)
# Only one request fetches from DB; others wait for the result
import threading

_locks = {}
_lock_mutex = threading.Lock()

def get_with_lock(key):
    cached = cache.get(key)
    if cached:
        return cached

    # Acquire a per-key lock
    with _lock_mutex:
        if key not in _locks:
            _locks[key] = threading.Lock()
        lock = _locks[key]

    if lock.acquire(blocking=False):
        try:
            # Double-check after acquiring lock
            cached = cache.get(key)
            if cached:
                return cached
            # Only THIS request hits the DB
            data = db.fetch(key)
            cache.set(key, data, ttl=300)
            return data
        finally:
            lock.release()
    else:
        # Another request is loading — wait and retry
        lock.acquire()
        lock.release()
        return cache.get(key)  # should be populated now

# Solution 2: Early Recomputation
# Refresh the cache BEFORE TTL expires
# Set logical expiry at 80% of TTL, refresh in background
def get_with_early_refresh(key, ttl=300):
    data, created_at = cache.get_with_metadata(key)
    if data:
        age = time.time() - created_at
        if age > ttl * 0.8:
            # Trigger background refresh
            threading.Thread(target=refresh_cache, args=(key,)).start()
        return data
    # True miss — fetch synchronously
    return fetch_and_cache(key, ttl)

▶ Cache Stampede & Mutex Solution

See what happens when a hot key expires and many clients request it simultaneously. Then see how a mutex lock solves it.

Thundering Herd

The thundering herd problem is similar to cache stampede but broader. It occurs when a large number of processes or threads are waiting for the same event (lock release, cache population, server restart), and when the event occurs, all of them wake up simultaneously, overwhelming the system.

# Solutions for thundering herd:

# 1. Jittered retry — add randomness to retry timing
import random
def get_with_jitter(key, max_retries=5):
    for attempt in range(max_retries):
        data = cache.get(key)
        if data:
            return data
        # Random backoff to spread load
        time.sleep(random.uniform(0.01, 0.1) * (2 ** attempt))
    return db.fetch(key)

# 2. Request coalescing — deduplicate identical in-flight requests
# Libraries: Go's singleflight, Python's cachetools
from cachetools import TTLCache
_inflight = {}

async def get_coalesced(key):
    if key in _inflight:
        return await _inflight[key]  # wait for existing request
    future = asyncio.create_task(fetch_from_db(key))
    _inflight[key] = future
    try:
        result = await future
        cache.set(key, result, ttl=300)
        return result
    finally:
        del _inflight[key]

Cache Penetration

Cache penetration occurs when requests repeatedly query for keys that don't exist in the cache OR the database. Every request results in a cache miss followed by a database query that returns nothing. This can be exploited as a denial-of-service attack.

# Solution 1: Cache null results
def get_user(user_id):
    cached = cache.get(f"user:{user_id}")
    if cached == "NULL_MARKER":
        return None  # known non-existent
    if cached:
        return json.loads(cached)

    user = db.get_user(user_id)
    if user:
        cache.set(f"user:{user_id}", json.dumps(user), ttl=300)
    else:
        # Cache the absence with a short TTL
        cache.set(f"user:{user_id}", "NULL_MARKER", ttl=60)
    return user

# Solution 2: Bloom Filter
# A space-efficient probabilistic data structure that tells you:
# - "Definitely NOT in the set" (100% accurate)
# - "Probably in the set" (small false positive rate)
from pybloom_live import BloomFilter

# Initialize with all valid keys
bf = BloomFilter(capacity=10_000_000, error_rate=0.001)
for key in db.get_all_valid_keys():
    bf.add(key)

def get_with_bloom(key):
    if key not in bf:
        return None  # definitely doesn't exist — skip DB entirely
    # Might exist — proceed with normal cache-aside
    return cache_aside_get(key)

# Redis has a built-in Bloom filter module:
# BF.ADD valid_users user:1001
# BF.EXISTS valid_users user:9999  → 0 (definitely not exists)

Cache Avalanche

Cache avalanche happens when many cache keys expire at the same time, causing a sudden flood of database queries. This often occurs when a large batch of data is loaded into the cache simultaneously (all with the same TTL).

# Solution: Staggered TTL (add random jitter)
import random

def cache_set_with_jitter(key, value, base_ttl=300):
    # Add ±10% jitter to the TTL
    jitter = random.randint(-base_ttl // 10, base_ttl // 10)
    actual_ttl = base_ttl + jitter  # 270–330 seconds
    cache.set(key, value, ttl=actual_ttl)

# For batch loading:
for product in products:
    cache_set_with_jitter(
        f"product:{product.id}",
        json.dumps(product),
        base_ttl=600  # ~10 min with jitter
    )

# Other solutions:
# - Multi-tier TTL: hot keys get longer TTL
# - Background refresh: preemptively refresh before expiry
# - Circuit breaker: if DB is overloaded, serve stale cache data

Summary of cache problems and solutions:

Stampede: Mutex locks, singleflight, early recomputation
Thundering Herd: Jittered backoff, request coalescing
Penetration: Cache null results, Bloom filters
Avalanche: Staggered TTL, multi-tier expiry, circuit breakers

Caching at Different Layers

Browser Cache (HTTP Caching)

The browser cache is the closest cache to the user. Proper HTTP cache headers can eliminate network requests entirely for repeat visits. This is controlled via HTTP response headers.

# HTTP Cache Headers

# Cache-Control: the primary directive
Cache-Control: public, max-age=31536000    # CDN & browser, 1 year (static assets)
Cache-Control: private, max-age=600        # browser only, 10 min (user-specific data)
Cache-Control: no-cache                    # always revalidate with server
Cache-Control: no-store                    # never cache (sensitive data)
Cache-Control: stale-while-revalidate=60   # serve stale for 60s while revalidating

# ETag: content-based validation
# Server sends: ETag: "abc123"
# Browser sends: If-None-Match: "abc123"
# Server responds: 304 Not Modified (if unchanged) → no body transferred

# Last-Modified / If-Modified-Since
# Server sends: Last-Modified: Wed, 01 Jan 2026 00:00:00 GMT
# Browser sends: If-Modified-Since: Wed, 01 Jan 2026 00:00:00 GMT
# Server responds: 304 Not Modified (if unchanged)

# Nginx configuration for static assets:
location ~* \.(js|css|png|jpg|gif|svg|woff2)$ {
    expires 1y;
    add_header Cache-Control "public, immutable";
    add_header Vary "Accept-Encoding";
}

# API responses with short cache:
location /api/ {
    add_header Cache-Control "private, max-age=0, must-revalidate";
    add_header ETag $request_id;
}

Cache-Control cheat sheet:

public — any cache (CDN, proxy, browser) can store
private — only the browser can store (user-specific data)
max-age=N — fresh for N seconds
no-cache — must revalidate every time (can still store)
no-store — never store anywhere (passwords, tokens)
immutable — content will never change (use with cache-busted URLs)
stale-while-revalidate=N — serve stale for N seconds while fetching fresh

CDN Cache

A CDN (Content Delivery Network) caches content at edge locations geographically close to users. This reduces latency from hundreds of milliseconds (cross-continent round trip) to single-digit milliseconds (nearest PoP).

# CloudFront cache configuration (AWS)
# Origin: your application server
# Behavior: cache based on specific headers/query strings

# CloudFront cache policy examples:
CachingOptimized:
  MinTTL: 1
  MaxTTL: 31536000
  DefaultTTL: 86400
  Headers: none
  QueryStrings: none
  Cookies: none

# Invalidation (purge specific paths):
aws cloudfront create-invalidation \
  --distribution-id E1XXXXXX \
  --paths "/api/products/*" "/index.html"

# Cloudflare Page Rules:
# *.example.com/static/* → Cache Level: Cache Everything, TTL: 1 month
# *.example.com/api/*    → Cache Level: Bypass

Key design decisions:

Cache key composition: What makes two requests "the same"? URL? URL + query string? URL + cookie? Keep cache keys minimal to maximize hit ratio.
Purge strategy: How do you invalidate CDN-cached content? Purge APIs, versioned URLs (app.v2.js), or content-hash URLs (app.a1b2c3.js).
Origin shield: An additional cache layer between edge and origin that reduces origin load.

Application-Level Cache

This is the layer you design explicitly in system design interviews. It sits between your application servers and the database.

# Application cache architecture patterns:

# Pattern 1: Local in-process cache (fastest, but per-instance)
from functools import lru_cache

@lru_cache(maxsize=1000)
def get_config(key):
    return db.get_config(key)

# Pattern 2: Shared distributed cache (Redis/Memcached)
# All app instances share the same cache
def get_user_profile(user_id):
    key = f"profile:{user_id}"
    cached = redis.get(key)
    if cached:
        return json.loads(cached)
    profile = db.get_profile(user_id)
    redis.setex(key, 300, json.dumps(profile))
    return profile

# Pattern 3: Two-tier cache (local + distributed)
# Check local cache first, then Redis, then DB
def get_product(product_id):
    # Tier 1: Local cache (100μs)
    local = local_cache.get(product_id)
    if local:
        return local

    # Tier 2: Redis (1ms)
    key = f"product:{product_id}"
    cached = redis.get(key)
    if cached:
        product = json.loads(cached)
        local_cache.set(product_id, product, ttl=30)
        return product

    # Tier 3: Database (50ms)
    product = db.get_product(product_id)
    redis.setex(key, 300, json.dumps(product))
    local_cache.set(product_id, product, ttl=30)
    return product

Database Query Cache

Many databases have built-in query caches that store the results of recent queries.

# MySQL Query Cache (deprecated in MySQL 8.0, but instructive)
# Was effective for read-heavy, rarely-changing data
# Invalidated on ANY write to tables in the query — too aggressive

# PostgreSQL shared_buffers (buffer pool)
# Caches frequently accessed data pages in memory
shared_buffers = 4GB    # typically 25% of total RAM
effective_cache_size = 12GB  # how much OS cache to expect

# PostgreSQL also supports materialized views:
CREATE MATERIALIZED VIEW popular_products AS
  SELECT p.*, COUNT(o.id) as order_count
  FROM products p
  JOIN orders o ON o.product_id = p.id
  GROUP BY p.id
  HAVING COUNT(o.id) > 100;

-- Refresh periodically (or on trigger)
REFRESH MATERIALIZED VIEW CONCURRENTLY popular_products;

CPU Cache (Context)

While you won't design CPU caches in interviews, knowing they exist helps explain why certain data structure choices are faster. CPU caches exploit spatial and temporal locality:

L1 cache: ~1ns access, ~64 KB per core
L2 cache: ~4ns access, ~256 KB per core
L3 cache: ~10ns access, ~8–32 MB shared
Main memory: ~100ns access
SSD: ~100,000ns (100μs)
Network (same datacenter): ~500,000ns (0.5ms)

This is why arrays are faster than linked lists for iteration — arrays are stored contiguously in memory, so loading one element pre-fetches adjacent elements into the CPU cache line. Linked list nodes are scattered in memory, causing constant cache misses.

Redis Deep Dive

Redis is so central to caching in system design that it deserves a deeper look. Understanding Redis internals helps you make better design decisions and answer follow-up questions in interviews.

Data Structures

Redis is not just a key-value store — it's a data structure server. Each data type has optimized operations that run in O(1) or O(log N) time.

# STRINGS — binary-safe, up to 512MB
SET counter 0
INCR counter            # atomic increment → 1
INCRBY counter 10       # → 11
DECRBY counter 5        # → 6
APPEND greeting "Hello"
APPEND greeting " World" # → "Hello World"
STRLEN greeting          # → 11

# LISTS — doubly linked list, O(1) push/pop at both ends
LPUSH tasks "task-3" "task-2" "task-1"  # left push
RPOP tasks              # → "task-3" (FIFO queue pattern)
LRANGE tasks 0 -1       # → ["task-1", "task-2"]
LLEN tasks              # → 2
# Blocking pop (for job queues):
BRPOP tasks 30          # block for up to 30 seconds

# SETS — unordered unique elements
SADD online_users "alice" "bob" "charlie"
SISMEMBER online_users "alice"   # → 1 (true)
SCARD online_users               # → 3 (cardinality)
SRANDMEMBER online_users 2      # → 2 random members

# SORTED SETS — elements scored and ranked
ZADD leaderboard 1500 "alice" 2300 "bob" 1800 "charlie"
ZREVRANGE leaderboard 0 2 WITHSCORES    # top 3 by score (descending)
ZRANK leaderboard "alice"                # → 0 (lowest score)
ZREVRANK leaderboard "bob"               # → 0 (highest score)
ZINCRBY leaderboard 500 "alice"          # → 2000
ZRANGEBYSCORE leaderboard 1500 2000      # range query

# HASHES — field-value maps (like objects)
HSET user:1001 name "Alice" age 30 email "alice@example.com"
HGET user:1001 name      # → "Alice"
HMGET user:1001 name age  # → ["Alice", "30"]
HINCRBY user:1001 age 1   # → 31
HGETALL user:1001          # → all fields and values

# STREAMS — append-only log (Kafka-like)
XADD events * type "page_view" url "/products" user "alice"
XADD events * type "click" element "buy_btn" user "bob"
XLEN events               # → 2
XRANGE events - +         # all entries
# Consumer groups for parallel processing:
XGROUP CREATE events mygroup 0
XREADGROUP GROUP mygroup consumer-1 COUNT 10 STREAMS events >

Persistence: RDB vs AOF

Redis is in-memory but offers two persistence mechanisms to survive restarts:

# RDB (Redis Database Backup) — Point-in-time snapshots
# Creates a binary dump of all data at intervals
# redis.conf:
save 900 1      # snapshot if ≥1 key changed in 900 seconds
save 300 10     # snapshot if ≥10 keys changed in 300 seconds
save 60 10000   # snapshot if ≥10000 keys changed in 60 seconds
dbfilename dump.rdb
dir /var/lib/redis/

# Pros: compact file, fast recovery, low CPU overhead
# Cons: data loss between snapshots (up to minutes of data)

# AOF (Append Only File) — Write log
# Logs every write operation; replayed on restart
# redis.conf:
appendonly yes
appendfilename "appendonly.aof"
appendfsync everysec      # fsync once per second (balanced)
# appendfsync always      # fsync on every write (safest, slowest)
# appendfsync no          # let OS decide (fastest, risky)

# Pros: minimal data loss (at most 1 second with everysec)
# Cons: larger file, slower recovery, needs periodic rewrite

# Best practice: use BOTH RDB + AOF
# AOF for minimal data loss, RDB for fast disaster recovery
# On restart, Redis loads AOF (more complete) if available

Redis Cluster

Redis Cluster provides automatic sharding across multiple nodes using a concept of hash slots.

# Redis Cluster Architecture:
# - 16,384 hash slots distributed across master nodes
# - Key assignment: slot = CRC16(key) % 16384
# - Each master owns a range of slots
# - Each master has one or more replicas for HA

# Example 3-node cluster:
#   Master A: slots 0–5460      (+ Replica A')
#   Master B: slots 5461–10922  (+ Replica B')
#   Master C: slots 10923–16383 (+ Replica C')

# Hash tags for multi-key operations:
# Keys with same hash tag go to same slot
SET {user:1001}.profile "..."   # slot = CRC16("user:1001")
SET {user:1001}.settings "..."  # same slot!
# Now MGET {user:1001}.profile {user:1001}.settings works

# Cluster commands:
redis-cli --cluster create \
  node1:6379 node2:6379 node3:6379 \
  node4:6379 node5:6379 node6:6379 \
  --cluster-replicas 1

CLUSTER INFO        # cluster state
CLUSTER NODES       # node listing
CLUSTER SLOTS       # slot assignments

Redis Sentinel (High Availability)

Sentinel provides HA for non-clustered Redis setups through automatic failover.

# Sentinel monitors master nodes and promotes replicas on failure

# sentinel.conf:
sentinel monitor mymaster 192.168.1.10 6379 2   # quorum of 2
sentinel down-after-milliseconds mymaster 5000    # 5s to detect failure
sentinel failover-timeout mymaster 60000          # 60s failover timeout
sentinel parallel-syncs mymaster 1                # 1 replica syncs at a time

# Flow:
# 1. Sentinel detects master is down (SDOWN → ODOWN after quorum)
# 2. Sentinel elects a leader among Sentinels
# 3. Leader promotes best replica to master
# 4. Other replicas reconfigure to replicate from new master
# 5. Old master (when it returns) becomes a replica

# Application connects via Sentinel (not directly to master):
import redis
sentinel = redis.Sentinel(
    [('sentinel1', 26379), ('sentinel2', 26379), ('sentinel3', 26379)]
)
master = sentinel.master_for('mymaster')
master.set('key', 'value')
slave = sentinel.slave_for('mymaster')
slave.get('key')

Common Redis Patterns

# Pattern 1: Session Store
SET session:abc123 '{"user_id":1001,"role":"admin"}' EX 1800
GET session:abc123

# Pattern 2: Rate Limiter (sliding window)
# Allow 100 requests per minute per user
def is_rate_limited(user_id, limit=100, window=60):
    key = f"rate:{user_id}"
    pipe = redis.pipeline()
    now = time.time()
    pipe.zremrangebyscore(key, 0, now - window)  # remove old entries
    pipe.zadd(key, {str(now): now})              # add current request
    pipe.zcard(key)                               # count requests in window
    pipe.expire(key, window)                      # auto-cleanup
    _, _, count, _ = pipe.execute()
    return count > limit

# Pattern 3: Distributed Lock (Redlock)
# Acquire lock:
SET lock:order:5001 "worker-42" NX PX 30000  # NX = only if not exists
# PX 30000 = auto-expire in 30 seconds (prevent deadlock)

# Release lock (Lua for atomicity):
# EVAL "if redis.call('get',KEYS[1]) == ARGV[1] then
#         return redis.call('del',KEYS[1])
#       else return 0 end"
#       1 lock:order:5001 "worker-42"

# Pattern 4: Leaderboard
ZADD game:scores 2500 "player_a" 1800 "player_b" 3200 "player_c"
ZREVRANGE game:scores 0 9 WITHSCORES  # top 10
ZREVRANK game:scores "player_b"        # player_b's rank

# Pattern 5: Pub/Sub Notifications
SUBSCRIBE order:events
PUBLISH order:events '{"type":"created","order_id":5001}'

# Pattern 6: HyperLogLog (cardinality estimation)
PFADD daily:visitors "user:1001" "user:1002" "user:1003"
PFCOUNT daily:visitors  # → 3 (approximate unique count)
# Uses only ~12KB of memory for billions of unique elements!

Summary

Caching stores frequently accessed data in fast storage to reduce latency and database load. Aim for a hit ratio > 90%.
Cache-Aside is the default strategy — the app checks cache, falls back to DB on miss, and populates cache. Use this unless you have a specific reason not to.
Write-Through provides strong consistency; Write-Behind provides maximum write performance at the cost of durability risk.
LRU is the default eviction policy. Use TTL for time-sensitive data. LFU for frequency-based workloads.
Redis is the industry standard for distributed caching. Know its data structures, persistence options, and clustering.
Cache invalidation is hard. Start with TTL, use event-based invalidation for multi-service architectures.
Cache stampede, penetration, and avalanche are real failure modes. Know the solutions: mutex locks, bloom filters, staggered TTL.
HTTP caching (Cache-Control, ETag) is free performance. Always configure it for static assets.

Caching is not a silver bullet — it adds complexity, introduces consistency challenges, and requires careful tuning. But in virtually every large-scale system, it's the single most impactful optimization for latency and throughput. In the next post, we explore SQL vs NoSQL databases — the storage layer that caching protects.