Caching Strategies
What is Caching?
Caching is the technique of storing copies of frequently accessed data in a fast, temporary storage layer so that future requests for that data can be served more quickly than fetching it from the original, slower source. The cache sits closer to the consumer — in memory instead of on disk, on the same machine instead of across a network — and trades storage space for speed.
Every caching system revolves around three fundamental concepts:
- Cache Hit: The requested data is found in the cache. The response is served directly from the cache — fast, cheap, no downstream load.
- Cache Miss: The data is not in the cache. The system must fetch it from the original data source (database, API, disk), which is slower and more expensive.
- Hit Ratio: The percentage of requests that result in cache hits. A hit ratio of 95% means 95 out of 100 requests are served from cache. This is the single most important metric for cache effectiveness.
Hit Ratio = Cache Hits / (Cache Hits + Cache Misses)
Example:
Total requests = 1,000,000
Cache hits = 950,000
Cache misses = 50,000
Hit ratio = 95%
Effective latency = (0.95 × 1ms) + (0.05 × 50ms)
= 0.95ms + 2.5ms
= 3.45ms (vs 50ms without cache → ~14x faster)
The latency difference is dramatic. A typical in-memory cache like Redis serves reads in sub-millisecond time, while a database query might take 5–100ms depending on indexes, joins, and load. At scale, this difference is the difference between a responsive application and an overwhelmed one.
Where is Caching Applied?
Caching exists at every layer of a modern system stack:
At the system design level, we primarily focus on the application cache layer (Redis, Memcached) and the CDN / HTTP cache layer. These are the layers you explicitly design and control. CPU and OS caches are managed by hardware and operating systems, but understanding them helps you write cache-friendly code.
▶ Cache Hit / Miss Flow
Watch how a cache hit returns data instantly, while a miss requires a round-trip to the database.
Caching Strategies
How data flows between your application, the cache, and the database defines the caching strategy. Each strategy makes different trade-offs between consistency, performance, and complexity. Understanding these trade-offs is essential for system design interviews.
Cache-Aside (Lazy Loading)
Cache-aside is the most widely used caching pattern. The application code is responsible for all cache interactions — the cache is a passive store, and the application orchestrates reads and writes.
Read path:
- Application receives a read request.
- Application checks the cache for the data.
- If cache hit: return data from cache.
- If cache miss: query the database, write the result to the cache, then return the data.
Write path:
- Application writes data directly to the database.
- Application invalidates (deletes) the corresponding cache entry.
# Cache-Aside pattern in Python (with Redis)
import redis
import json
r = redis.Redis(host='localhost', port=6379, db=0)
def get_user(user_id):
# Step 1: Check cache
cache_key = f"user:{user_id}"
cached = r.get(cache_key)
if cached:
# Cache HIT
return json.loads(cached)
# Cache MISS: fetch from database
user = db.query("SELECT * FROM users WHERE id = %s", user_id)
if user:
# Populate cache with TTL of 5 minutes
r.setex(cache_key, 300, json.dumps(user))
return user
def update_user(user_id, data):
# Write to database
db.execute("UPDATE users SET ... WHERE id = %s", user_id, data)
# Invalidate cache (delete, don't update)
r.delete(f"user:{user_id}")
Pros:
- Only requested data is cached (no wasted memory on unused data)
- Cache failures are non-fatal — the app falls back to the database
- Works with any database; the cache is decoupled from the data source
- Simple to implement and reason about
Cons:
- First request for each key always results in a cache miss (cold start penalty)
- Data can become stale if the database is updated by another service that doesn't invalidate the cache
- Application code is responsible for all cache logic — more code to maintain
Read-Through
In read-through, the cache itself is responsible for loading data from the database on a cache miss. The application always reads from the cache; it never directly queries the database for cached data.
- Application reads from cache.
- If cache hit: return data.
- If cache miss: the cache library/provider automatically loads data from the database, stores it, and returns it.
// Read-Through with a cache provider (conceptual Java)
@Cacheable(value = "users", key = "#userId")
public User getUser(String userId) {
// This method is ONLY called on cache miss.
// The cache framework handles checking the cache first.
return userRepository.findById(userId);
}
// Spring Cache with Redis as read-through provider
@Configuration
@EnableCaching
public class CacheConfig {
@Bean
public RedisCacheManager cacheManager(RedisConnectionFactory cf) {
RedisCacheConfiguration config = RedisCacheConfiguration
.defaultCacheConfig()
.entryTtl(Duration.ofMinutes(10))
.serializeValuesWith(
SerializationPair.fromSerializer(new GenericJackson2JsonRedisSerializer())
);
return RedisCacheManager.builder(cf)
.cacheDefaults(config)
.build();
}
}
Pros: Application code is simpler — no cache logic to write. The cache framework handles all the plumbing.
Cons: The cache must understand how to load data (needs a data loader configured). Less flexible if different keys require different loading logic.
Write-Through
Write-through ensures that every write goes to both the cache and the database synchronously. The cache acts as the primary write interface, and it persists the data to the database before confirming the write.
- Application writes data to the cache.
- Cache synchronously writes the data to the database.
- Write is confirmed to the application only after both are successful.
# Write-Through pattern (conceptual)
def write_through(key, value):
# Write to cache
cache.set(key, value)
# Synchronously write to database
db.execute("INSERT INTO data (key, value) VALUES (%s, %s) "
"ON CONFLICT (key) DO UPDATE SET value = %s",
key, value, value)
# Only now is the write considered complete
Pros: Cache data is always consistent with the database. No stale reads.
Cons: Every write incurs the latency of both cache and database writes. Higher write latency. Data that's written but never read wastes cache space.
Write-Behind (Write-Back)
Write-behind improves on write-through by making the database write asynchronous. The application writes to the cache, which immediately confirms the write. The cache then asynchronously flushes the data to the database, often in batches.
- Application writes data to the cache.
- Cache immediately confirms the write (fast!).
- Cache asynchronously writes to the database (batched, delayed).
# Write-Behind pattern (conceptual)
import threading
import queue
write_queue = queue.Queue()
def write_behind(key, value):
# Write to cache immediately
cache.set(key, value)
# Queue the DB write
write_queue.put((key, value))
# Return immediately — write is "done" from client's perspective
def flush_worker():
"""Background thread that batches writes to DB."""
batch = []
while True:
try:
item = write_queue.get(timeout=1)
batch.append(item)
if len(batch) >= 100:
db.batch_upsert(batch)
batch.clear()
except queue.Empty:
if batch:
db.batch_upsert(batch)
batch.clear()
# Start background flusher
threading.Thread(target=flush_worker, daemon=True).start()
Pros: Extremely fast writes. Batching reduces database load significantly. Great for write-heavy workloads.
Cons: Risk of data loss if the cache crashes before flushing to the database. Data in the database can be stale for the duration of the async delay. More complex to implement correctly.
Write-Around
Write-around skips the cache entirely on writes. Data is written directly to the database. The cache is only populated on reads (via cache-aside or read-through).
# Write-Around pattern
def write_around(key, value):
# Write directly to DB, bypass cache
db.execute("INSERT INTO data (key, value) VALUES (%s, %s) "
"ON CONFLICT (key) DO UPDATE SET value = %s",
key, value, value)
# Optionally invalidate stale cache entry
cache.delete(key)
Pros: Good for data that's written once and rarely read. Doesn't pollute the cache with data that might never be accessed.
Cons: Reads after a write will always be a cache miss (until lazy-loaded). Higher read latency for recently-written data.
Strategy Comparison
| Strategy | Read Perf. | Write Perf. | Consistency | Complexity | Best For |
|---|---|---|---|---|---|
| Cache-Aside | Fast (after first miss) | Normal (DB direct) | Eventual | Low | General purpose, read-heavy |
| Read-Through | Fast (after first miss) | Normal (DB direct) | Eventual | Medium | Frameworks with cache providers |
| Write-Through | Fast (always in cache) | Slow (sync both) | Strong | Medium | Read-heavy, consistency critical |
| Write-Behind | Fast | Very Fast | Eventual (risk of loss) | High | Write-heavy, tolerates loss |
| Write-Around | Slow (after write) | Normal (DB direct) | Strong (DB) | Low | Write-once, read-rarely |
Eviction Policies
Caches have finite memory. When the cache is full and a new entry needs to be added, the cache must evict an existing entry. The eviction policy determines which entry gets removed. The right policy can mean the difference between a 95% hit ratio and a 70% hit ratio.
LRU (Least Recently Used)
LRU evicts the item that hasn't been accessed for the longest time. It's the most commonly used eviction policy because it works well with temporal locality — data accessed recently is likely to be accessed again soon.
Implementation: A doubly-linked list combined with a hash map. The hash map provides O(1) lookup by key. The linked list maintains access order — the most recently accessed item is at the head, the least recently accessed is at the tail. On eviction, remove the tail.
class LRUCache:
def __init__(self, capacity: int):
self.capacity = capacity
self.cache = {} # key → Node
self.head = Node(0, 0) # dummy head
self.tail = Node(0, 0) # dummy tail
self.head.next = self.tail
self.tail.prev = self.head
def get(self, key: int) -> int:
if key in self.cache:
node = self.cache[key]
self._remove(node)
self._add_to_head(node)
return node.value
return -1 # cache miss
def put(self, key: int, value: int):
if key in self.cache:
self._remove(self.cache[key])
node = Node(key, value)
self._add_to_head(node)
self.cache[key] = node
if len(self.cache) > self.capacity:
# Evict LRU item (tail)
lru = self.tail.prev
self._remove(lru)
del self.cache[lru.key]
def _add_to_head(self, node):
node.next = self.head.next
node.prev = self.head
self.head.next.prev = node
self.head.next = node
def _remove(self, node):
node.prev.next = node.next
node.next.prev = node.prev
class Node:
def __init__(self, key, value):
self.key, self.value = key, value
self.prev = self.next = None
Time Complexity: O(1) for both get and put. This is why LRU is so popular — it provides optimal performance for cache operations.
▶ LRU Cache Eviction
Watch items get added to a 4-slot cache. When full, the least recently used item is evicted.
LFU (Least Frequently Used)
LFU evicts the item with the lowest access frequency. Each cache entry has a counter that increments on every access. When eviction is needed, the entry with the smallest counter is removed.
# Redis supports LFU natively
# redis.conf:
maxmemory-policy allkeys-lfu
# LFU uses a logarithmic counter (Morris counter) with decay:
# - Counter incremented probabilistically (log scale)
# - Counter decayed over time to handle changing access patterns
# - lfu-log-factor 10 (higher = slower counter growth)
# - lfu-decay-time 1 (minutes before counter halves)
Use case: LFU is better than LRU when some items are consistently popular. A viral post that gets accessed millions of times should stay cached even if it wasn't accessed in the last few seconds.
Drawback: New items start with a low frequency count and may be evicted immediately. Items that were popular in the past but no longer relevant can "stick" in cache.
FIFO (First In, First Out)
FIFO evicts the oldest entry regardless of how often or recently it was accessed. Simple to implement with a queue.
Use case: When all items have roughly equal access probability and simplicity matters more than optimality.
Random Eviction
Random eviction picks a random entry to evict. Surprisingly, random eviction performs within 10–20% of LRU for many real-world workloads, and it requires zero bookkeeping overhead.
# Redis random eviction
maxmemory-policy allkeys-random
Use case: High-throughput systems where the overhead of maintaining LRU linked lists is too expensive. CPU caches often use pseudo-random policies for this reason.
TTL-Based Expiration
TTL (Time-To-Live) expiration removes entries after a fixed duration, regardless of access patterns. TTL is not strictly an eviction policy — it's an expiration mechanism that works alongside eviction policies.
# Redis TTL commands
SET user:1001 '{"name":"Alice"}' EX 300 # expires in 300 seconds
SET session:abc '...' PX 1800000 # expires in 1,800,000 ms (30 min)
EXPIRE user:1001 600 # reset TTL to 600 seconds
TTL user:1001 # check remaining TTL
PERSIST user:1001 # remove TTL (key lives forever)
# Redis uses two strategies for TTL cleanup:
# 1. Lazy deletion: check TTL on access, delete if expired
# 2. Active expiration: background process samples keys and deletes expired ones
Distributed Caching
A single cache server has limited memory and is a single point of failure. In production systems serving millions of users, you need a distributed cache — a cache spread across multiple nodes that collectively provide higher capacity, fault tolerance, and throughput.
Redis
Redis (Remote Dictionary Server) is the most popular distributed cache. It's an in-memory data structure store that supports far more than simple key-value operations.
# Basic Redis operations
SET user:1001 '{"name":"Alice","age":30}' # set a key
GET user:1001 # get a key
DEL user:1001 # delete
MGET user:1001 user:1002 user:1003 # batch get (pipeline)
SETNX lock:order:5001 "worker-1" # set if not exists (for locks)
# Data structure operations
HSET user:1001 name "Alice" age 30 city "NYC" # hash (object)
HGET user:1001 name # → "Alice"
HGETALL user:1001 # → all fields
LPUSH queue:tasks "task-1" "task-2" # list (queue)
RPOP queue:tasks # → "task-1"
SADD tags:post:101 "redis" "caching" "hld" # set
SMEMBERS tags:post:101 # → all members
SINTER tags:post:101 tags:post:102 # intersection
ZADD leaderboard 1500 "alice" 2300 "bob" # sorted set
ZRANGEBYSCORE leaderboard 1000 2000 # range query
ZRANK leaderboard "bob" # → rank (0-indexed)
# Pub/Sub
SUBSCRIBE channel:notifications
PUBLISH channel:notifications "New order received"
# Streams (append-only log, like Kafka)
XADD orders * product "laptop" qty 1
XREAD COUNT 10 STREAMS orders 0
Memcached
Memcached is a simpler, faster alternative to Redis for pure key-value caching. It's multi-threaded (Redis is mostly single-threaded), making it more efficient on multi-core machines for simple get/set workloads.
# Memcached operations (telnet-style protocol)
set user:1001 0 300 27 # set with 300s TTL
{"name":"Alice","age":30}
get user:1001 # retrieve
delete user:1001 # remove
# Memcached via Python
import pymemcache
client = pymemcache.Client(('localhost', 11211))
client.set('user:1001', json.dumps(user), expire=300)
result = client.get('user:1001')
Redis vs Memcached
| Feature | Redis | Memcached |
|---|---|---|
| Data Structures | Strings, Lists, Sets, Sorted Sets, Hashes, Streams, HyperLogLog, Bitmaps | Strings only (key-value) |
| Threading | Single-threaded event loop (I/O threads in Redis 6+) | Multi-threaded |
| Persistence | RDB snapshots + AOF log | None (pure in-memory) |
| Replication | Master-replica with automatic failover (Sentinel/Cluster) | No built-in replication |
| Max Value Size | 512 MB | 1 MB (default) |
| Pub/Sub | Yes | No |
| Lua Scripting | Yes (atomic operations) | No |
| Use Case | Complex caching, sessions, queues, leaderboards, rate limiting | Simple, high-throughput key-value caching |
Consistent Hashing for Cache Distribution
When you have N cache servers, you need a way to decide which server holds which key. Naive modular hashing (server = hash(key) % N) breaks catastrophically when servers are added or removed — almost every key remaps, causing a cache avalanche.
Consistent hashing solves this by mapping both servers and keys onto a virtual ring (0 to 232). Each key maps to the next server clockwise on the ring. When a server is added or removed, only K/N keys need to be remapped (where K is total keys and N is total servers), not all of them.
# Consistent hashing with virtual nodes
import hashlib
class ConsistentHash:
def __init__(self, nodes, virtual_nodes=150):
self.ring = {}
self.sorted_keys = []
for node in nodes:
for i in range(virtual_nodes):
key = self._hash(f"{node}:{i}")
self.ring[key] = node
self.sorted_keys.append(key)
self.sorted_keys.sort()
def _hash(self, key):
return int(hashlib.md5(key.encode()).hexdigest(), 16)
def get_node(self, key):
h = self._hash(key)
# Find the first node clockwise from the hash
for ring_key in self.sorted_keys:
if h <= ring_key:
return self.ring[ring_key]
return self.ring[self.sorted_keys[0]] # wrap around
# Usage
ch = ConsistentHash(["redis-1", "redis-2", "redis-3"])
server = ch.get_node("user:1001") # → "redis-2"
Virtual nodes (also called vnodes) ensure even distribution. Without them, servers can end up with very unequal ranges on the ring. With 150 virtual nodes per physical server, the load is typically balanced within 5–10%.
Cache Invalidation
Cache invalidation is the process of removing or updating stale data from the cache. Getting it wrong means users see outdated data; getting it right is genuinely hard in distributed systems.
TTL-Based Invalidation
The simplest approach: every cache entry has a TTL. After expiration, the entry is automatically removed and the next read fetches fresh data.
# Redis TTL-based invalidation
SET product:5001 '{"name":"Widget","price":29.99}' EX 600 # 10 min TTL
# The trade-off is staleness window:
# - TTL 60s → max 60s stale, but higher miss rate
# - TTL 3600s → up to 1 hour stale, but great hit ratio
# - Choose based on your tolerance for stale data
Best for: Data that changes infrequently or where short-lived staleness is acceptable (product catalogs, user profiles, config data).
Event-Based Invalidation (Pub/Sub)
When data changes, publish an invalidation event. All cache nodes subscribe and evict the relevant entries immediately.
# Event-based invalidation with Redis Pub/Sub
# Publisher (when data changes):
def update_product(product_id, data):
db.update_product(product_id, data)
redis.publish("cache:invalidate", f"product:{product_id}")
# Subscriber (cache invalidation listener):
def listen_for_invalidations():
pubsub = redis.pubsub()
pubsub.subscribe("cache:invalidate")
for message in pubsub.listen():
if message['type'] == 'message':
key = message['data']
redis.delete(key)
print(f"Invalidated cache key: {key}")
# In a microservices architecture, you might use Kafka instead:
# Order Service → publishes "order.updated" event
# Cache Service → consumes event, invalidates relevant cache keys
Best for: Multi-service architectures where one service writes and many services cache the data. Ensures near-real-time invalidation.
Version-Based Invalidation
Embed a version number in the cache key. When data changes, increment the version. Old versions naturally expire via TTL.
# Version-based invalidation
# Store current version in a fast lookup
SET product:5001:version 7
# Cache key includes the version
SET product:5001:v7 '{"name":"Widget","price":29.99}' EX 3600
# On read:
version = GET product:5001:version # → 7
data = GET product:5001:v{version} # → cached data
# On write:
INCR product:5001:version # → 8
# Old key "product:5001:v7" will expire via TTL
# Next read will miss on "product:5001:v8" and fetch from DB
Best for: Immutable cache entries (static assets, rendered pages). Avoids explicit deletion — just change the version.
Choosing an Invalidation Strategy
| Strategy | Staleness | Complexity | Best For |
|---|---|---|---|
| TTL | Up to TTL duration | Very Low | General purpose, single-service |
| Event-Based | Near real-time | High | Multi-service, event-driven architectures |
| Version-Based | None (new version) | Medium | Immutable content, static assets |
| Active (on write) | None | Medium | Single-service, strong consistency |
Cache Problems
Caching introduces a set of failure modes that don't exist without it. These problems frequently appear in system design interviews, and understanding their solutions is essential.
Cache Stampede
A cache stampede (also called "dog-piling") occurs when a popular cache key expires and many requests simultaneously attempt to rebuild it. Instead of one request fetching from the DB and populating the cache, hundreds of requests all independently query the database.
# The problem: 500 concurrent requests for the same expired key
# All 500 hit the DB simultaneously
# Solution 1: Mutex Lock (Singleflight)
# Only one request fetches from DB; others wait for the result
import threading
_locks = {}
_lock_mutex = threading.Lock()
def get_with_lock(key):
cached = cache.get(key)
if cached:
return cached
# Acquire a per-key lock
with _lock_mutex:
if key not in _locks:
_locks[key] = threading.Lock()
lock = _locks[key]
if lock.acquire(blocking=False):
try:
# Double-check after acquiring lock
cached = cache.get(key)
if cached:
return cached
# Only THIS request hits the DB
data = db.fetch(key)
cache.set(key, data, ttl=300)
return data
finally:
lock.release()
else:
# Another request is loading — wait and retry
lock.acquire()
lock.release()
return cache.get(key) # should be populated now
# Solution 2: Early Recomputation
# Refresh the cache BEFORE TTL expires
# Set logical expiry at 80% of TTL, refresh in background
def get_with_early_refresh(key, ttl=300):
data, created_at = cache.get_with_metadata(key)
if data:
age = time.time() - created_at
if age > ttl * 0.8:
# Trigger background refresh
threading.Thread(target=refresh_cache, args=(key,)).start()
return data
# True miss — fetch synchronously
return fetch_and_cache(key, ttl)
▶ Cache Stampede & Mutex Solution
See what happens when a hot key expires and many clients request it simultaneously. Then see how a mutex lock solves it.
Thundering Herd
The thundering herd problem is similar to cache stampede but broader. It occurs when a large number of processes or threads are waiting for the same event (lock release, cache population, server restart), and when the event occurs, all of them wake up simultaneously, overwhelming the system.
# Solutions for thundering herd:
# 1. Jittered retry — add randomness to retry timing
import random
def get_with_jitter(key, max_retries=5):
for attempt in range(max_retries):
data = cache.get(key)
if data:
return data
# Random backoff to spread load
time.sleep(random.uniform(0.01, 0.1) * (2 ** attempt))
return db.fetch(key)
# 2. Request coalescing — deduplicate identical in-flight requests
# Libraries: Go's singleflight, Python's cachetools
from cachetools import TTLCache
_inflight = {}
async def get_coalesced(key):
if key in _inflight:
return await _inflight[key] # wait for existing request
future = asyncio.create_task(fetch_from_db(key))
_inflight[key] = future
try:
result = await future
cache.set(key, result, ttl=300)
return result
finally:
del _inflight[key]
Cache Penetration
Cache penetration occurs when requests repeatedly query for keys that don't exist in the cache OR the database. Every request results in a cache miss followed by a database query that returns nothing. This can be exploited as a denial-of-service attack.
# Solution 1: Cache null results
def get_user(user_id):
cached = cache.get(f"user:{user_id}")
if cached == "NULL_MARKER":
return None # known non-existent
if cached:
return json.loads(cached)
user = db.get_user(user_id)
if user:
cache.set(f"user:{user_id}", json.dumps(user), ttl=300)
else:
# Cache the absence with a short TTL
cache.set(f"user:{user_id}", "NULL_MARKER", ttl=60)
return user
# Solution 2: Bloom Filter
# A space-efficient probabilistic data structure that tells you:
# - "Definitely NOT in the set" (100% accurate)
# - "Probably in the set" (small false positive rate)
from pybloom_live import BloomFilter
# Initialize with all valid keys
bf = BloomFilter(capacity=10_000_000, error_rate=0.001)
for key in db.get_all_valid_keys():
bf.add(key)
def get_with_bloom(key):
if key not in bf:
return None # definitely doesn't exist — skip DB entirely
# Might exist — proceed with normal cache-aside
return cache_aside_get(key)
# Redis has a built-in Bloom filter module:
# BF.ADD valid_users user:1001
# BF.EXISTS valid_users user:9999 → 0 (definitely not exists)
Cache Avalanche
Cache avalanche happens when many cache keys expire at the same time, causing a sudden flood of database queries. This often occurs when a large batch of data is loaded into the cache simultaneously (all with the same TTL).
# Solution: Staggered TTL (add random jitter)
import random
def cache_set_with_jitter(key, value, base_ttl=300):
# Add ±10% jitter to the TTL
jitter = random.randint(-base_ttl // 10, base_ttl // 10)
actual_ttl = base_ttl + jitter # 270–330 seconds
cache.set(key, value, ttl=actual_ttl)
# For batch loading:
for product in products:
cache_set_with_jitter(
f"product:{product.id}",
json.dumps(product),
base_ttl=600 # ~10 min with jitter
)
# Other solutions:
# - Multi-tier TTL: hot keys get longer TTL
# - Background refresh: preemptively refresh before expiry
# - Circuit breaker: if DB is overloaded, serve stale cache data
- Stampede: Mutex locks, singleflight, early recomputation
- Thundering Herd: Jittered backoff, request coalescing
- Penetration: Cache null results, Bloom filters
- Avalanche: Staggered TTL, multi-tier expiry, circuit breakers
Caching at Different Layers
Browser Cache (HTTP Caching)
The browser cache is the closest cache to the user. Proper HTTP cache headers can eliminate network requests entirely for repeat visits. This is controlled via HTTP response headers.
# HTTP Cache Headers
# Cache-Control: the primary directive
Cache-Control: public, max-age=31536000 # CDN & browser, 1 year (static assets)
Cache-Control: private, max-age=600 # browser only, 10 min (user-specific data)
Cache-Control: no-cache # always revalidate with server
Cache-Control: no-store # never cache (sensitive data)
Cache-Control: stale-while-revalidate=60 # serve stale for 60s while revalidating
# ETag: content-based validation
# Server sends: ETag: "abc123"
# Browser sends: If-None-Match: "abc123"
# Server responds: 304 Not Modified (if unchanged) → no body transferred
# Last-Modified / If-Modified-Since
# Server sends: Last-Modified: Wed, 01 Jan 2026 00:00:00 GMT
# Browser sends: If-Modified-Since: Wed, 01 Jan 2026 00:00:00 GMT
# Server responds: 304 Not Modified (if unchanged)
# Nginx configuration for static assets:
location ~* \.(js|css|png|jpg|gif|svg|woff2)$ {
expires 1y;
add_header Cache-Control "public, immutable";
add_header Vary "Accept-Encoding";
}
# API responses with short cache:
location /api/ {
add_header Cache-Control "private, max-age=0, must-revalidate";
add_header ETag $request_id;
}
public— any cache (CDN, proxy, browser) can storeprivate— only the browser can store (user-specific data)max-age=N— fresh for N secondsno-cache— must revalidate every time (can still store)no-store— never store anywhere (passwords, tokens)immutable— content will never change (use with cache-busted URLs)stale-while-revalidate=N— serve stale for N seconds while fetching fresh
CDN Cache
A CDN (Content Delivery Network) caches content at edge locations geographically close to users. This reduces latency from hundreds of milliseconds (cross-continent round trip) to single-digit milliseconds (nearest PoP).
# CloudFront cache configuration (AWS)
# Origin: your application server
# Behavior: cache based on specific headers/query strings
# CloudFront cache policy examples:
CachingOptimized:
MinTTL: 1
MaxTTL: 31536000
DefaultTTL: 86400
Headers: none
QueryStrings: none
Cookies: none
# Invalidation (purge specific paths):
aws cloudfront create-invalidation \
--distribution-id E1XXXXXX \
--paths "/api/products/*" "/index.html"
# Cloudflare Page Rules:
# *.example.com/static/* → Cache Level: Cache Everything, TTL: 1 month
# *.example.com/api/* → Cache Level: Bypass
Key design decisions:
- Cache key composition: What makes two requests "the same"? URL? URL + query string? URL + cookie? Keep cache keys minimal to maximize hit ratio.
- Purge strategy: How do you invalidate CDN-cached content? Purge APIs, versioned URLs (
app.v2.js), or content-hash URLs (app.a1b2c3.js). - Origin shield: An additional cache layer between edge and origin that reduces origin load.
Application-Level Cache
This is the layer you design explicitly in system design interviews. It sits between your application servers and the database.
# Application cache architecture patterns:
# Pattern 1: Local in-process cache (fastest, but per-instance)
from functools import lru_cache
@lru_cache(maxsize=1000)
def get_config(key):
return db.get_config(key)
# Pattern 2: Shared distributed cache (Redis/Memcached)
# All app instances share the same cache
def get_user_profile(user_id):
key = f"profile:{user_id}"
cached = redis.get(key)
if cached:
return json.loads(cached)
profile = db.get_profile(user_id)
redis.setex(key, 300, json.dumps(profile))
return profile
# Pattern 3: Two-tier cache (local + distributed)
# Check local cache first, then Redis, then DB
def get_product(product_id):
# Tier 1: Local cache (100μs)
local = local_cache.get(product_id)
if local:
return local
# Tier 2: Redis (1ms)
key = f"product:{product_id}"
cached = redis.get(key)
if cached:
product = json.loads(cached)
local_cache.set(product_id, product, ttl=30)
return product
# Tier 3: Database (50ms)
product = db.get_product(product_id)
redis.setex(key, 300, json.dumps(product))
local_cache.set(product_id, product, ttl=30)
return product
Database Query Cache
Many databases have built-in query caches that store the results of recent queries.
# MySQL Query Cache (deprecated in MySQL 8.0, but instructive)
# Was effective for read-heavy, rarely-changing data
# Invalidated on ANY write to tables in the query — too aggressive
# PostgreSQL shared_buffers (buffer pool)
# Caches frequently accessed data pages in memory
shared_buffers = 4GB # typically 25% of total RAM
effective_cache_size = 12GB # how much OS cache to expect
# PostgreSQL also supports materialized views:
CREATE MATERIALIZED VIEW popular_products AS
SELECT p.*, COUNT(o.id) as order_count
FROM products p
JOIN orders o ON o.product_id = p.id
GROUP BY p.id
HAVING COUNT(o.id) > 100;
-- Refresh periodically (or on trigger)
REFRESH MATERIALIZED VIEW CONCURRENTLY popular_products;
CPU Cache (Context)
While you won't design CPU caches in interviews, knowing they exist helps explain why certain data structure choices are faster. CPU caches exploit spatial and temporal locality:
- L1 cache: ~1ns access, ~64 KB per core
- L2 cache: ~4ns access, ~256 KB per core
- L3 cache: ~10ns access, ~8–32 MB shared
- Main memory: ~100ns access
- SSD: ~100,000ns (100μs)
- Network (same datacenter): ~500,000ns (0.5ms)
This is why arrays are faster than linked lists for iteration — arrays are stored contiguously in memory, so loading one element pre-fetches adjacent elements into the CPU cache line. Linked list nodes are scattered in memory, causing constant cache misses.
Redis Deep Dive
Redis is so central to caching in system design that it deserves a deeper look. Understanding Redis internals helps you make better design decisions and answer follow-up questions in interviews.
Data Structures
Redis is not just a key-value store — it's a data structure server. Each data type has optimized operations that run in O(1) or O(log N) time.
# STRINGS — binary-safe, up to 512MB
SET counter 0
INCR counter # atomic increment → 1
INCRBY counter 10 # → 11
DECRBY counter 5 # → 6
APPEND greeting "Hello"
APPEND greeting " World" # → "Hello World"
STRLEN greeting # → 11
# LISTS — doubly linked list, O(1) push/pop at both ends
LPUSH tasks "task-3" "task-2" "task-1" # left push
RPOP tasks # → "task-3" (FIFO queue pattern)
LRANGE tasks 0 -1 # → ["task-1", "task-2"]
LLEN tasks # → 2
# Blocking pop (for job queues):
BRPOP tasks 30 # block for up to 30 seconds
# SETS — unordered unique elements
SADD online_users "alice" "bob" "charlie"
SISMEMBER online_users "alice" # → 1 (true)
SCARD online_users # → 3 (cardinality)
SRANDMEMBER online_users 2 # → 2 random members
# SORTED SETS — elements scored and ranked
ZADD leaderboard 1500 "alice" 2300 "bob" 1800 "charlie"
ZREVRANGE leaderboard 0 2 WITHSCORES # top 3 by score (descending)
ZRANK leaderboard "alice" # → 0 (lowest score)
ZREVRANK leaderboard "bob" # → 0 (highest score)
ZINCRBY leaderboard 500 "alice" # → 2000
ZRANGEBYSCORE leaderboard 1500 2000 # range query
# HASHES — field-value maps (like objects)
HSET user:1001 name "Alice" age 30 email "alice@example.com"
HGET user:1001 name # → "Alice"
HMGET user:1001 name age # → ["Alice", "30"]
HINCRBY user:1001 age 1 # → 31
HGETALL user:1001 # → all fields and values
# STREAMS — append-only log (Kafka-like)
XADD events * type "page_view" url "/products" user "alice"
XADD events * type "click" element "buy_btn" user "bob"
XLEN events # → 2
XRANGE events - + # all entries
# Consumer groups for parallel processing:
XGROUP CREATE events mygroup 0
XREADGROUP GROUP mygroup consumer-1 COUNT 10 STREAMS events >
Persistence: RDB vs AOF
Redis is in-memory but offers two persistence mechanisms to survive restarts:
# RDB (Redis Database Backup) — Point-in-time snapshots
# Creates a binary dump of all data at intervals
# redis.conf:
save 900 1 # snapshot if ≥1 key changed in 900 seconds
save 300 10 # snapshot if ≥10 keys changed in 300 seconds
save 60 10000 # snapshot if ≥10000 keys changed in 60 seconds
dbfilename dump.rdb
dir /var/lib/redis/
# Pros: compact file, fast recovery, low CPU overhead
# Cons: data loss between snapshots (up to minutes of data)
# AOF (Append Only File) — Write log
# Logs every write operation; replayed on restart
# redis.conf:
appendonly yes
appendfilename "appendonly.aof"
appendfsync everysec # fsync once per second (balanced)
# appendfsync always # fsync on every write (safest, slowest)
# appendfsync no # let OS decide (fastest, risky)
# Pros: minimal data loss (at most 1 second with everysec)
# Cons: larger file, slower recovery, needs periodic rewrite
# Best practice: use BOTH RDB + AOF
# AOF for minimal data loss, RDB for fast disaster recovery
# On restart, Redis loads AOF (more complete) if available
Redis Cluster
Redis Cluster provides automatic sharding across multiple nodes using a concept of hash slots.
# Redis Cluster Architecture:
# - 16,384 hash slots distributed across master nodes
# - Key assignment: slot = CRC16(key) % 16384
# - Each master owns a range of slots
# - Each master has one or more replicas for HA
# Example 3-node cluster:
# Master A: slots 0–5460 (+ Replica A')
# Master B: slots 5461–10922 (+ Replica B')
# Master C: slots 10923–16383 (+ Replica C')
# Hash tags for multi-key operations:
# Keys with same hash tag go to same slot
SET {user:1001}.profile "..." # slot = CRC16("user:1001")
SET {user:1001}.settings "..." # same slot!
# Now MGET {user:1001}.profile {user:1001}.settings works
# Cluster commands:
redis-cli --cluster create \
node1:6379 node2:6379 node3:6379 \
node4:6379 node5:6379 node6:6379 \
--cluster-replicas 1
CLUSTER INFO # cluster state
CLUSTER NODES # node listing
CLUSTER SLOTS # slot assignments
Redis Sentinel (High Availability)
Sentinel provides HA for non-clustered Redis setups through automatic failover.
# Sentinel monitors master nodes and promotes replicas on failure
# sentinel.conf:
sentinel monitor mymaster 192.168.1.10 6379 2 # quorum of 2
sentinel down-after-milliseconds mymaster 5000 # 5s to detect failure
sentinel failover-timeout mymaster 60000 # 60s failover timeout
sentinel parallel-syncs mymaster 1 # 1 replica syncs at a time
# Flow:
# 1. Sentinel detects master is down (SDOWN → ODOWN after quorum)
# 2. Sentinel elects a leader among Sentinels
# 3. Leader promotes best replica to master
# 4. Other replicas reconfigure to replicate from new master
# 5. Old master (when it returns) becomes a replica
# Application connects via Sentinel (not directly to master):
import redis
sentinel = redis.Sentinel(
[('sentinel1', 26379), ('sentinel2', 26379), ('sentinel3', 26379)]
)
master = sentinel.master_for('mymaster')
master.set('key', 'value')
slave = sentinel.slave_for('mymaster')
slave.get('key')
Common Redis Patterns
# Pattern 1: Session Store
SET session:abc123 '{"user_id":1001,"role":"admin"}' EX 1800
GET session:abc123
# Pattern 2: Rate Limiter (sliding window)
# Allow 100 requests per minute per user
def is_rate_limited(user_id, limit=100, window=60):
key = f"rate:{user_id}"
pipe = redis.pipeline()
now = time.time()
pipe.zremrangebyscore(key, 0, now - window) # remove old entries
pipe.zadd(key, {str(now): now}) # add current request
pipe.zcard(key) # count requests in window
pipe.expire(key, window) # auto-cleanup
_, _, count, _ = pipe.execute()
return count > limit
# Pattern 3: Distributed Lock (Redlock)
# Acquire lock:
SET lock:order:5001 "worker-42" NX PX 30000 # NX = only if not exists
# PX 30000 = auto-expire in 30 seconds (prevent deadlock)
# Release lock (Lua for atomicity):
# EVAL "if redis.call('get',KEYS[1]) == ARGV[1] then
# return redis.call('del',KEYS[1])
# else return 0 end"
# 1 lock:order:5001 "worker-42"
# Pattern 4: Leaderboard
ZADD game:scores 2500 "player_a" 1800 "player_b" 3200 "player_c"
ZREVRANGE game:scores 0 9 WITHSCORES # top 10
ZREVRANK game:scores "player_b" # player_b's rank
# Pattern 5: Pub/Sub Notifications
SUBSCRIBE order:events
PUBLISH order:events '{"type":"created","order_id":5001}'
# Pattern 6: HyperLogLog (cardinality estimation)
PFADD daily:visitors "user:1001" "user:1002" "user:1003"
PFCOUNT daily:visitors # → 3 (approximate unique count)
# Uses only ~12KB of memory for billions of unique elements!
Summary
- Caching stores frequently accessed data in fast storage to reduce latency and database load. Aim for a hit ratio > 90%.
- Cache-Aside is the default strategy — the app checks cache, falls back to DB on miss, and populates cache. Use this unless you have a specific reason not to.
- Write-Through provides strong consistency; Write-Behind provides maximum write performance at the cost of durability risk.
- LRU is the default eviction policy. Use TTL for time-sensitive data. LFU for frequency-based workloads.
- Redis is the industry standard for distributed caching. Know its data structures, persistence options, and clustering.
- Cache invalidation is hard. Start with TTL, use event-based invalidation for multi-service architectures.
- Cache stampede, penetration, and avalanche are real failure modes. Know the solutions: mutex locks, bloom filters, staggered TTL.
- HTTP caching (Cache-Control, ETag) is free performance. Always configure it for static assets.
Caching is not a silver bullet — it adds complexity, introduces consistency challenges, and requires careful tuning. But in virtually every large-scale system, it's the single most impactful optimization for latency and throughput. In the next post, we explore SQL vs NoSQL databases — the storage layer that caching protects.