← All Posts
High Level Design Series · Data Storage · Part 3

Key-Value Stores (Redis, DynamoDB)

The Key-Value Model

A key-value store is the simplest form of a database: it maps unique keys to opaque values. Think of it as a giant distributed hash map. The store doesn't inspect or index the values — all it guarantees is that given a key, it can return the associated value in O(1) amortized time.

The API surface is deliberately tiny:

GET(key)           → value | null
PUT(key, value)    → OK
DELETE(key)        → OK

This simplicity is the source of both its power and its limitations:

Common use cases for key-value stores:

Use CaseKey PatternValueStore
Session cachesess:{session_id}JSON blobRedis
Shopping cartcart:{user_id}Hash of itemsRedis
Rate limitingrl:{ip}:{window}CounterRedis
User profileUSER#123JSON documentDynamoDB
Feature flagsflag:{feature_name}Boolean/JSONRedis / etcd
Distributed locklock:{resource}Owner token + TTLRedis

Hash Table Internals

Every key-value store is ultimately built on top of a hash table. Understanding how hash tables work internally is critical for understanding why KV stores have O(1) lookups, how they handle collisions, and why resizing is expensive.

How a Hash Table Works

  1. Hash Function: Take the key (e.g., "user:42") and pass it through a hash function (e.g., MurmurHash3, SipHash, CRC16) to produce a fixed-size integer.
  2. Bucket Index: Compute hash(key) % num_buckets to get the index into an array of "buckets."
  3. Collision Resolution: Multiple keys can hash to the same bucket. Two strategies:
    • Chaining: Each bucket points to a linked list of (key, value) pairs. On collision, append to the list. Lookup scans the list (O(n) worst case, O(1) amortized with good hash functions).
    • Open Addressing: On collision, probe the next slot (linear probing, quadratic probing, or double hashing). Redis uses chaining; DynamoDB's internal storage uses a variant of consistent hashing.
  4. Load Factor: α = n / m where n = number of entries, m = number of buckets. When α exceeds a threshold (typically 0.75), the table resizes — usually doubling the number of buckets and rehashing all entries.

Redis specifically uses a technique called incremental rehashing: instead of rehashing all keys at once (which would block the single-threaded event loop), Redis maintains two hash tables simultaneously and gradually migrates keys from the old table to the new one, one bucket at a time, on each operation.

▶ Hash Table Lookup

Step through a key lookup with collision resolution via chaining.

Redis Hash Table Implementation

Redis's dictionary (dict.h) uses two hash tables (ht[0] and ht[1]) for incremental rehashing:

typedef struct dict {
    dictType *type;      // function pointers: hash, keyCompare, etc.
    dictht ht[2];        // ht[0] = current, ht[1] = resize target
    long rehashidx;      // -1 when not rehashing, else current bucket index
    int16_t pauserehash; // >0 means rehashing is paused
} dict;

typedef struct dictht {
    dictEntry **table;   // array of pointers to linked list heads
    unsigned long size;  // always a power of 2
    unsigned long sizemask; // size - 1 (for fast modulo via bitwise AND)
    unsigned long used;  // number of entries
} dictht;

Key implementation details:

Redis Deep Dive

Redis (Remote Dictionary Server) is an in-memory data structure store that can serve as a database, cache, message broker, and streaming engine. It's single-threaded for command processing (using an event loop), which eliminates lock contention and gives it predictable sub-millisecond latency for most operations.

Key characteristics:

Redis Data Structures

Redis is not a simple key-value store — it's a data structure server. Each value type has its own set of atomic operations. This is what makes Redis dramatically more useful than Memcached.

String

Binary-safe, up to 512 MB. The most basic type.

# Basic get/set
SET user:42:name "Alice"
GET user:42:name          → "Alice"

# Atomic counter
INCR page:views:home      → 1
INCRBY page:views:home 10 → 11

# Set with expiry (session token)
SET sess:abc123 "{user_id:42}" EX 3600

# Set only if not exists (distributed lock)
SET lock:order:99 "worker-1" NX PX 30000

# Bit operations
SETBIT user:42:features 3 1
BITCOUNT user:42:features → 1

List

Doubly linked list. O(1) push/pop at both ends.

# Task queue (producer)
LPUSH queue:emails "{to:'a@b.com'}"
LPUSH queue:emails "{to:'c@d.com'}"

# Task queue (consumer - blocking)
BRPOP queue:emails 30  → pops oldest

# Recent activity feed (capped list)
LPUSH feed:user:42 "liked post #123"
LTRIM feed:user:42 0 99  # keep last 100

# Range query
LRANGE feed:user:42 0 9  → last 10 items
LLEN queue:emails         → 2

Set

Unordered unique elements. O(1) add/remove/check.

# Track unique visitors
SADD visitors:2026-04-15 "user:42"
SADD visitors:2026-04-15 "user:43"
SCARD visitors:2026-04-15  → 2

# Tag system
SADD post:99:tags "redis" "nosql" "cache"
SMEMBERS post:99:tags
SISMEMBER post:99:tags "redis" → 1

# Set operations
SADD friends:alice "bob" "carol" "dave"
SADD friends:bob "alice" "carol" "eve"
SINTER friends:alice friends:bob  → "carol"
SUNION friends:alice friends:bob
SDIFF friends:alice friends:bob   → "dave"

Sorted Set (ZSet)

Set with scores. Skip list + hash table. O(log N) add.

# Leaderboard
ZADD leaderboard 2500 "alice"
ZADD leaderboard 1800 "bob"
ZADD leaderboard 3100 "carol"

# Top 10 players (descending)
ZREVRANGE leaderboard 0 9 WITHSCORES
# → "carol" 3100, "alice" 2500, "bob" 1800

# Rank lookup
ZREVRANK leaderboard "alice"  → 1 (0-indexed)
ZSCORE leaderboard "alice"    → "2500"

# Range by score (time-series window)
ZRANGEBYSCORE events 1714000000 1714100000

# Remove old entries
ZREMRANGEBYSCORE events -inf 1713900000

Hash

Field-value pairs inside a key. Like a mini document.

# User profile as hash
HSET user:42 name "Alice" email "a@b.com"
HSET user:42 login_count 0

HGET user:42 name           → "Alice"
HGETALL user:42
# → "name" "Alice" "email" "a@b.com"
#   "login_count" "0"

# Atomic field increment
HINCRBY user:42 login_count 1  → 1

# Partial update (only change email)
HSET user:42 email "alice@new.com"

# Memory efficient: hashes with <128 fields
# use ziplist encoding (contiguous memory)

Stream

Append-only log with consumer groups. Like Kafka-lite.

# Produce events
XADD orders * user_id 42 total 15990
# → "1714000000000-0" (auto-generated ID)

XADD orders * user_id 43 total 8500
# → "1714000000001-0"

# Read all events from beginning
XRANGE orders - +

# Consumer group (multiple workers)
XGROUP CREATE orders grp1 0
XREADGROUP GROUP grp1 worker-1 COUNT 10
  BLOCK 5000 STREAMS orders >

# Acknowledge processed messages
XACK orders grp1 "1714000000000-0"

# Pending entries (unacknowledged)
XPENDING orders grp1

HyperLogLog

Probabilistic cardinality estimation. ~12 KB per key. 0.81% std error.

# Count unique visitors (billions of elements,
# only 12 KB of memory!)
PFADD visitors:today "user:42" "user:43"
PFADD visitors:today "user:42"  # duplicate
PFCOUNT visitors:today  → 2

# Merge multiple periods
PFMERGE visitors:week
  visitors:mon visitors:tue visitors:wed

PFCOUNT visitors:week  → approximate unique count

Bitmap & Bitfield

Bit-level operations on strings. Ultra-compact for flags.

# Daily active users (1 bit per user ID)
SETBIT dau:2026-04-15 42 1
SETBIT dau:2026-04-15 43 1
BITCOUNT dau:2026-04-15  → 2

# Users active on BOTH days
BITOP AND active_both dau:2026-04-15
  dau:2026-04-16
BITCOUNT active_both  → count

# Bitfield: packed integer arrays
BITFIELD player:42 SET u8 #0 100  # HP = 100
BITFIELD player:42 SET u8 #1 50   # MP = 50
BITFIELD player:42 INCRBY u8 #0 -25 # damage
Memory optimization: Redis uses special compact encodings for small data structures. Hashes with fewer than hash-max-ziplist-entries (default 128) entries and values shorter than hash-max-ziplist-value (default 64 bytes) are stored as ziplists (contiguous memory, no pointers) — using ~10x less memory than the full hash table encoding. Similar optimizations exist for lists (quicklist = linked list of ziplists), sets (intset for small integer sets), and sorted sets (ziplist for small sets).

Persistence: RDB vs AOF

Redis is in-memory, but that doesn't mean your data vanishes on restart. Redis offers two persistence mechanisms, and you can use both simultaneously:

RDB (Redis Database Snapshots)

Point-in-time snapshots of the entire dataset, saved as a compact binary file.

# redis.conf
save 900 1      # snapshot if ≥1 key changed in 900s
save 300 10     # snapshot if ≥10 keys changed in 300s
save 60 10000   # snapshot if ≥10K keys changed in 60s

# Manual trigger
BGSAVE          # fork() → child writes dump.rdb
SAVE            # blocks main thread (never in prod!)

# How it works:
# 1. Redis calls fork() to create child process
# 2. Child writes entire dataset to temp file
# 3. Child atomically renames temp → dump.rdb
# 4. Parent continues serving requests
# 5. Copy-on-write: only modified pages duplicated
  • Pro: Compact file, fast restarts, good for backups
  • Pro: Child process does the I/O — zero impact on latency
  • Con: Data loss between snapshots (up to minutes)
  • Con: fork() can be slow with large datasets (copies page tables)

AOF (Append-Only File)

Logs every write command. Replay the log to reconstruct state.

# redis.conf
appendonly yes
appendfsync everysec   # fsync once per second (default)
# appendfsync always   # fsync on every write (safest)
# appendfsync no       # let OS decide (fastest)

# AOF file looks like:
*3\r\n$3\r\nSET\r\n$7\r\nuser:42\r\n$5\r\nAlice\r\n

# AOF rewrite (compaction):
BGREWRITEAOF
# Rewrites AOF with minimum commands to
# reconstruct current state. E.g., 1000 INCRs
# become a single SET with final value.

# Redis 7.0+: Multi-part AOF
# base.aof (RDB-format snapshot) +
# incr.aof (incremental commands since snapshot)
  • Pro: Minimal data loss (at most 1 second with everysec)
  • Pro: Append-only = no corruption risk from partial writes
  • Con: Larger file size than RDB
  • Con: Slower restarts (must replay all commands)

Recommended production config: Enable both RDB and AOF. Redis will use AOF for recovery (more complete data) and RDB for backups and faster cold starts. The Redis 7.0 multi-part AOF essentially combines the best of both — an RDB base with incremental AOF on top.

Persistence Decision Matrix

RequirementUseConfig
Pure cache, no durability neededNonesave "", appendonly no
Accept a few minutes of data lossRDB onlysave 300 10
Accept at most ~1 second of lossAOF everysecappendfsync everysec
Zero data loss (rare for Redis)AOF alwaysappendfsync always (~10x slower)
Best of both worlds (recommended)RDB + AOFBoth enabled, AOF used for recovery

Redis Cluster

A single Redis node can hold ~25 GB of data comfortably (limited by RAM). For larger datasets or higher throughput, you need Redis Cluster — Redis's built-in sharding solution that distributes data across multiple nodes.

How Hash Slots Work

Redis Cluster divides the entire keyspace into 16,384 hash slots (numbered 0–16,383). Each key is assigned to a slot using:

slot = CRC16(key) mod 16384

Each master node in the cluster owns a subset of these slots. With 3 masters:

Each master has one or more replicas for failover. A typical production cluster runs 6 nodes: 3 masters + 3 replicas.

Hash Tags for Multi-Key Operations

Since multi-key commands (MGET, pipeline transactions) only work when all keys hash to the same slot, Redis supports hash tags: only the substring inside {...} is hashed.

# These all hash to the same slot (hash of "user:42")
SET {user:42}:profile "..."
SET {user:42}:settings "..."
SET {user:42}:cart "..."

# Now MGET works across all three
MGET {user:42}:profile {user:42}:settings {user:42}:cart

# Without hash tags, these could land on different nodes
# → CROSSSLOT error on multi-key commands

MOVED Redirects

When a client sends a command to the wrong node, the node responds with a MOVED redirect:

# Client sends to Node A:
GET user:500
# Node A calculates: CRC16("user:500") % 16384 = 7365
# 7365 is owned by Node B, not Node A

# Node A responds:
-MOVED 7365 192.168.1.2:6379

# Smart clients (like redis-py-cluster, Lettuce, Jedis)
# cache the slot→node mapping and send future requests
# for slot 7365 directly to Node B.

Resharding & Rebalancing

Adding a new node to the cluster:

# 1. Add Node D to the cluster
redis-cli --cluster add-node 192.168.1.4:6379 192.168.1.1:6379

# 2. Reshard: move slots from existing nodes to Node D
redis-cli --cluster reshard 192.168.1.1:6379
# → Interactively choose how many slots (e.g., 4096)
# → Choose source nodes (A, B, C) and target (D)

# During resharding:
# - Keys in migrating slots trigger ASK redirect
# - Client follows ASK to new node for that key
# - Once migration is complete, MOVED is returned instead
# - Entire process is online — no downtime

▶ Redis Cluster Hash Slot Routing

Watch how a key is routed to the correct node via CRC16 hashing, including a MOVED redirect.

Redis Sentinel

Redis Sentinel provides high availability for standalone Redis deployments (non-cluster mode). It handles automatic failover, monitoring, and service discovery.

How Sentinel works:

  1. Monitoring: Sentinel continuously pings the master and replicas. If a master doesn't respond within down-after-milliseconds, the sentinel marks it as subjectively down (SDOWN).
  2. Quorum: When enough sentinels (a configurable quorum, typically 2 out of 3) agree the master is down, it's marked objectively down (ODOWN).
  3. Failover: Sentinels elect a leader who promotes the best replica to master. "Best" = most up-to-date replication offset.
  4. Notification: Clients subscribed to Sentinel are notified of the new master's address. Smart clients (Lettuce, redis-py with Sentinel support) reconnect automatically.
# sentinel.conf
sentinel monitor mymaster 192.168.1.1 6379 2  # quorum=2
sentinel down-after-milliseconds mymaster 5000
sentinel failover-timeout mymaster 60000
sentinel parallel-syncs mymaster 1

# Query Sentinel for current master
redis-cli -p 26379 SENTINEL get-master-addr-by-name mymaster
# → "192.168.1.2" "6379" (after failover)
Sentinel vs Cluster: Sentinel provides HA for a single-shard deployment (one dataset, replicated). Redis Cluster provides HA and sharding (multiple shards, each with replicas). For datasets that fit on one node, Sentinel is simpler. For larger data or higher throughput, use Cluster.

Pub/Sub & Streams

Pub/Sub (Fire-and-Forget)

Redis Pub/Sub is a simple messaging system where publishers send messages to channels and all subscribed clients receive them in real-time. Messages are not persisted — if no subscriber is listening, the message is lost.

# Publisher
PUBLISH chat:room:42 "Hello from Alice!"
# → (integer) 2  (2 subscribers received it)

# Subscriber (blocking)
SUBSCRIBE chat:room:42
# Blocks and receives messages as they arrive:
# 1) "message"
# 2) "chat:room:42"
# 3) "Hello from Alice!"

# Pattern subscribe (wildcard)
PSUBSCRIBE chat:room:*
# Receives messages from ALL chat rooms

Streams (Persistent, Consumer Groups)

For durable messaging with consumer groups, acknowledgment, and replay — use Redis Streams (added in Redis 5.0). Streams solve every limitation of Pub/Sub:

FeaturePub/SubStreams
Persistence❌ Fire and forget✅ Stored on disk
Consumer groups❌ All get every msg✅ Load-balanced across group
Acknowledgment❌ No ack✅ XACK per message
Replay❌ Missed = lost✅ Read from any ID
Backpressure❌ Slow client drops✅ Pending list tracks unprocessed

Lua Scripting

Redis executes Lua scripts atomically — the entire script runs without any other command interleaving. This is the key to implementing complex atomic operations that can't be done with a single Redis command.

# Rate limiter using sliding window (Lua script)
# KEYS[1] = rate limit key, ARGV[1] = window (sec),
# ARGV[2] = max requests, ARGV[3] = current timestamp

local key = KEYS[1]
local window = tonumber(ARGV[1])
local max_req = tonumber(ARGV[2])
local now = tonumber(ARGV[3])

-- Remove entries outside the window
redis.call('ZREMRANGEBYSCORE', key, '-inf', now - window)

-- Count requests in current window
local count = redis.call('ZCARD', key)

if count < max_req then
    -- Under limit: add this request
    redis.call('ZADD', key, now, now .. '-' .. math.random(1000000))
    redis.call('EXPIRE', key, window)
    return 1  -- allowed
else
    return 0  -- rate limited
end
# Execute the script from a client
redis-cli EVAL "$(cat rate_limit.lua)" 1 \
  "rl:api:192.168.1.1" 60 100 1714000000
# → (integer) 1  (allowed)

# Or use EVALSHA with cached script hash
redis-cli SCRIPT LOAD "$(cat rate_limit.lua)"
# → "a1b2c3d4e5f6..."

redis-cli EVALSHA "a1b2c3d4e5f6..." 1 \
  "rl:api:192.168.1.1" 60 100 1714000000

Other common Lua script use cases:

# Safe distributed lock release (Redlock pattern)
# Only release if we still own the lock
if redis.call('GET', KEYS[1]) == ARGV[1] then
    return redis.call('DEL', KEYS[1])
else
    return 0  -- someone else holds it now
end

DynamoDB Deep Dive

Amazon DynamoDB is a fully managed, serverless key-value and document database that delivers single-digit millisecond performance at any scale. Unlike Redis (which you operate yourself), DynamoDB is a service — AWS handles replication, sharding, patching, and scaling.

Key characteristics:

Partition Key & Sort Key

Every DynamoDB table has a primary key that uniquely identifies each item. There are two types:

Simple Primary Key (Partition Key Only)

The partition key alone must be unique. DynamoDB hashes it to determine which physical partition stores the item.

# Table: Users
# PK: user_id (String)

{
  "user_id": "u-42",       ← partition key
  "name": "Alice",
  "email": "alice@ex.com",
  "plan": "premium"
}

# Access patterns:
# ✅ GetItem(user_id = "u-42")    → O(1)
# ❌ Query(email = "alice@...")   → full scan!

Composite Primary Key (PK + Sort Key)

Partition key + sort key together are unique. Enables range queries within a partition.

# Table: Orders
# PK: user_id (String), SK: order_id (String)

{
  "user_id": "u-42",       ← partition key
  "order_id": "ord-001",   ← sort key
  "total": 15990,
  "status": "shipped"
}

# Access patterns:
# ✅ GetItem(PK="u-42", SK="ord-001")
# ✅ Query(PK="u-42", SK begins_with "ord-")
# ✅ Query(PK="u-42", SK between "A" and "Z")
# ✅ Query(PK="u-42") → all orders for user

Partition key design is everything. A bad partition key causes hot partitions — all traffic concentrated on one physical partition while others sit idle. Good partition keys have high cardinality and uniform access distribution.

Good PKBad PKWhy
user_idcountryMillions of users vs. ~200 countries → hot partition on "US"
device_iddateAll IoT writes go to today's partition
order_id (UUID)statusOnly 5 statuses → extreme imbalance

Global & Local Secondary Indexes

DynamoDB's primary key determines the only efficient access pattern on the base table. For additional access patterns, you need secondary indexes.

Local Secondary Index (LSI)

Same partition key as the base table, different sort key. Must be created at table creation time. Limit: 5 per table.

# Base table: Orders
# PK: user_id, SK: order_id

# LSI: OrdersByDate
# PK: user_id (same), SK: created_at

# Now you can query:
# "All orders for user u-42, sorted by date"
response = table.query(
    IndexName='OrdersByDate',
    KeyConditionExpression=
        Key('user_id').eq('u-42') &
        Key('created_at').between(
            '2026-01-01', '2026-04-30'
        )
)
  • Strongly consistent reads ✅
  • Shares throughput with base table
  • 10 GB limit per partition key value

Global Secondary Index (GSI)

Completely different partition key and/or sort key. Can be created anytime. Limit: 20 per table.

# Base table: Orders
# PK: user_id, SK: order_id

# GSI: OrdersByStatus
# PK: status, SK: created_at

# Now you can query:
# "All pending orders, newest first"
response = table.query(
    IndexName='OrdersByStatus',
    KeyConditionExpression=
        Key('status').eq('pending'),
    ScanIndexForward=False  # descending
)

# GSI: OrdersByEmail
# PK: email, SK: user_id
# "Find user by email" → efficient lookup
  • Eventually consistent reads only ❌
  • Has its own provisioned throughput
  • Asynchronously replicated from base

Single-Table Design Patterns

Single-table design is a DynamoDB modeling technique where all entity types live in one table, using generic key names (PK, SK) with prefixed values to differentiate entities. This is the recommended approach from AWS for most applications.

Why? DynamoDB charges per table (provisioned capacity) and per-request (on-demand). Fewer tables = fewer things to manage, and you can fetch related entities in a single Query operation.

# Single-table design for an e-commerce app
# Table: ECommerce
# PK (String) | SK (String) | Attributes...

# User entity
PK=USER#u-42         SK=PROFILE           name=Alice email=alice@ex.com plan=premium

# User's orders (all in one partition → one Query)
PK=USER#u-42         SK=ORDER#2026-04-15#ord-001  total=15990 status=shipped
PK=USER#u-42         SK=ORDER#2026-04-10#ord-002  total=8500  status=delivered

# Order items (nested under order)
PK=ORDER#ord-001     SK=ITEM#sku-101              qty=1 price=9999
PK=ORDER#ord-001     SK=ITEM#sku-205              qty=2 price=2995

# Product entity
PK=PRODUCT#sku-101   SK=METADATA                  name="Widget" category="electronics"
PK=PRODUCT#sku-101   SK=REVIEW#u-43               rating=5 text="Great product!"

# GSI1: Inverted index (SK → PK)
# GSI1PK=ORDER#2026-04-15#ord-001  GSI1SK=USER#u-42
# → "Which user placed order ord-001?"

# GSI2: By status
# GSI2PK=shipped  GSI2SK=2026-04-15#ord-001
# → "All shipped orders, sorted by date"

Query Examples (Python / boto3)

import boto3
from boto3.dynamodb.conditions import Key

dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('ECommerce')

# 1. Get user profile
profile = table.get_item(
    Key={'PK': 'USER#u-42', 'SK': 'PROFILE'}
)['Item']

# 2. Get all orders for a user (sorted by date)
orders = table.query(
    KeyConditionExpression=
        Key('PK').eq('USER#u-42') &
        Key('SK').begins_with('ORDER#')
)['Items']

# 3. Get orders from last 30 days
recent = table.query(
    KeyConditionExpression=
        Key('PK').eq('USER#u-42') &
        Key('SK').between('ORDER#2026-03-16', 'ORDER#2026-04-16')
)['Items']

# 4. Get all items in an order
items = table.query(
    KeyConditionExpression=
        Key('PK').eq('ORDER#ord-001') &
        Key('SK').begins_with('ITEM#')
)['Items']

# 5. Batch get multiple entities (up to 100 items)
result = dynamodb.batch_get_item(
    RequestItems={
        'ECommerce': {
            'Keys': [
                {'PK': 'USER#u-42', 'SK': 'PROFILE'},
                {'PK': 'USER#u-43', 'SK': 'PROFILE'},
                {'PK': 'PRODUCT#sku-101', 'SK': 'METADATA'}
            ]
        }
    }
)

# 6. Transactional write (atomic across items)
dynamodb.meta.client.transact_write_items(
    TransactItems=[
        {'Put': {'TableName': 'ECommerce', 'Item': {
            'PK': 'USER#u-42',
            'SK': 'ORDER#2026-04-16#ord-003',
            'total': 5000, 'status': 'created'
        }}},
        {'Update': {'TableName': 'ECommerce',
            'Key': {'PK': 'PRODUCT#sku-101', 'SK': 'METADATA'},
            'UpdateExpression': 'SET stock = stock - :qty',
            'ConditionExpression': 'stock >= :qty',
            'ExpressionAttributeValues': {':qty': 1}
        }}
    ]
)

Capacity Planning

DynamoDB has two capacity modes. Understanding the math is critical for cost estimation in system design interviews.

Read Capacity Units (RCU)

Write Capacity Units (WCU)

Capacity Planning Example

Scenario: E-commerce platform with 10,000 DAU, peak 500 requests/sec.

# Read calculations:
# - Average item size: 2 KB
# - 80% reads are eventually consistent
# - 20% reads are strongly consistent
# - Peak: 400 reads/sec

Eventually consistent: 400 × 0.8 = 320 reads/sec
  Each 2 KB item = ceil(2/4) = 1 RCU per strongly consistent read
  Eventually consistent = 0.5 RCU per read
  → 320 × 0.5 = 160 RCUs

Strongly consistent: 400 × 0.2 = 80 reads/sec
  → 80 × 1 = 80 RCUs

Total read capacity: 160 + 80 = 240 RCUs

# Write calculations:
# - Average item size: 1.5 KB → ceil(1.5/1) = 2 WCUs per write
# - Peak: 100 writes/sec

Total write capacity: 100 × 2 = 200 WCUs

# Cost estimate (us-east-1, provisioned):
# RCU: $0.00013 per RCU per hour
# WCU: $0.00065 per WCU per hour
#
# Monthly (730 hours):
# Reads:  240 × $0.00013 × 730 = $22.78/month
# Writes: 200 × $0.00065 × 730 = $94.90/month
# Total: ~$118/month (provisioned)
#
# On-demand pricing:
# $1.25 per million write request units
# $0.25 per million read request units
# At 400 reads/sec × 86400 sec = 34.56M reads/day
# At 100 writes/sec × 86400 sec = 8.64M writes/day
# → ~$8.64 + ~$10.80 = ~$19.44/day = ~$583/month
# → Provisioned is cheaper at sustained load!

Adaptive Capacity

DynamoDB automatically handles uneven workloads with adaptive capacity. Even if one partition is receiving disproportionate traffic (hot partition), DynamoDB will:

  1. Burst capacity: Each partition can burst up to 300 seconds of unused throughput.
  2. Adaptive capacity: Automatically increases throughput for hot partitions by "borrowing" unused capacity from other partitions, as long as total table throughput isn't exceeded.
  3. Partition splitting: If a partition is consistently hot, DynamoDB will automatically split it into two partitions. This is transparent and requires no action.
Interview tip: When asked about DynamoDB capacity, mention three things: (1) the RCU/WCU math, (2) provisioned vs on-demand trade-offs (sustained load → provisioned; spiky → on-demand), and (3) adaptive capacity + auto-scaling as mitigations for hot partitions.

DynamoDB Streams & DAX

DynamoDB Streams

DynamoDB Streams captures a time-ordered sequence of item-level modifications (inserts, updates, deletes) in a DynamoDB table. Each stream record contains the primary key, the old/new image (or both), and a timestamp.

# Enable streams on a table
aws dynamodb update-table \
    --table-name ECommerce \
    --stream-specification \
        StreamEnabled=true,StreamViewType=NEW_AND_OLD_IMAGES

# Stream view types:
# KEYS_ONLY        → only the key attributes
# NEW_IMAGE        → the entire item after modification
# OLD_IMAGE        → the entire item before modification
# NEW_AND_OLD_IMAGES → both (most useful)

# Common use cases:
# 1. Trigger Lambda on item change (event-driven)
# 2. Cross-region replication (Global Tables use this)
# 3. Materialized views / denormalization
# 4. Analytics pipeline: Stream → Kinesis → S3 → Athena
# 5. Audit log / change data capture (CDC)

Example: Auto-update a search index when a product changes:

# Lambda function triggered by DynamoDB Stream
def handler(event, context):
    for record in event['Records']:
        if record['eventName'] in ('INSERT', 'MODIFY'):
            new_image = record['dynamodb']['NewImage']
            pk = new_image['PK']['S']
            
            if pk.startswith('PRODUCT#'):
                # Index in Elasticsearch
                es_client.index(
                    index='products',
                    id=pk,
                    body={
                        'name': new_image['name']['S'],
                        'category': new_image['category']['S'],
                        'price': int(new_image['price']['N'])
                    }
                )
        
        elif record['eventName'] == 'REMOVE':
            old_image = record['dynamodb']['OldImage']
            pk = old_image['PK']['S']
            if pk.startswith('PRODUCT#'):
                es_client.delete(index='products', id=pk)

DAX (DynamoDB Accelerator)

DAX is a fully managed, in-memory caching layer for DynamoDB that delivers microsecond response times (vs. single-digit millisecond for vanilla DynamoDB). It's API-compatible — just change the endpoint.

# Without DAX (standard DynamoDB)
import boto3
dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('ECommerce')
result = table.get_item(Key={'PK': 'USER#u-42', 'SK': 'PROFILE'})
# Latency: ~5 ms

# With DAX (just change the client!)
import amazondax
dax_client = amazondax.AmazonDaxClient(
    endpoint_url='dax://my-cluster.abc123.dax-clusters.us-east-1.amazonaws.com:8111'
)
table = dax_client.Table('ECommerce')
result = table.get_item(Key={'PK': 'USER#u-42', 'SK': 'PROFILE'})
# Latency: ~0.2 ms (25x faster)

# DAX caching behavior:
# - Item cache: caches GetItem/BatchGetItem results
# - Query cache: caches Query/Scan results
# - Write-through: writes go to both DAX and DynamoDB
# - TTL: default 5 minutes (configurable)
# - Cache invalidation: automatic on writes

When to use DAX vs. Redis as a DynamoDB cache:

CriteriaDAXRedis
IntegrationDrop-in (same API)Requires cache-aside code
Data structuresNone (just cached items)Rich (sorted sets, lists, etc.)
Cache invalidationAutomatic write-throughManual (you manage TTL/eviction)
Cost~$0.27/hr per nodeVaries (ElastiCache pricing)
Use caseRead-heavy DynamoDB onlyMulti-source caching, pub/sub, etc.

Redis vs DynamoDB vs Memcached

These three are often confused or conflated in system design discussions. They serve very different purposes:

Dimension Redis DynamoDB Memcached
TypeIn-memory data structure storeManaged NoSQL (SSD-backed)In-memory cache (strings only)
Data modelStrings, Lists, Sets, ZSets, Hashes, Streams, HLL, BitmapsItems (JSON-like) with typed attributesBinary strings (blobs) only
Max value size512 MB400 KB per item1 MB (default)
LatencySub-ms (in-memory)1–5 ms (SSD); μs with DAXSub-ms (in-memory)
PersistenceOptional (RDB + AOF)Always durable (3-AZ replication)None (pure cache)
ConsistencyStrong (single node) / eventual (replicas)Strong or eventually consistent (per query)N/A (no replication)
ScalingRedis Cluster (manual sharding)Automatic (serverless)Client-side sharding
Secondary indexesManual (sorted sets as indexes)GSI + LSI (up to 25 total)None
TransactionsMULTI/EXEC (optimistic locking)TransactWriteItems / TransactGetItems (ACID, up to 100 items)CAS (check and set) only
ScriptingLua scripts (atomic)ConditionExpressions, UpdateExpressionsNone
Pub/SubYes (Pub/Sub + Streams)DynamoDB Streams (CDC)No
Multi-threadingSingle-threaded cmd processing (I/O threads since 6.0)Managed (transparent)Multi-threaded
Ops burdenMedium (you manage cluster)Low (fully managed)Medium (you manage servers)
PricingPer node (RAM-based)Per RCU/WCU or per requestPer node (RAM-based)

When to Use Each

Choose Redis When…

  • Sub-millisecond latency is required (caching, session store, rate limiting)
  • You need rich data structures — leaderboards (sorted sets), queues (lists), unique counts (HyperLogLog)
  • You want atomic operations on complex data (Lua scripts, MULTI/EXEC)
  • Pub/Sub or Streams for real-time messaging within a service
  • Data fits in memory (RAM is the limit)
  • You can tolerate some data loss (even with AOF everysec, 1 second window)
  • Use as a complement to a primary database, not as the primary database itself (unless ephemeral data)

Choose DynamoDB When…

  • You need a durable, scalable primary database with consistent single-digit ms latency
  • Access patterns are known and limited (design your keys/indexes upfront)
  • You want zero operational overhead — no servers, patches, backups to manage
  • Auto-scaling for unpredictable traffic (on-demand mode)
  • Multi-region with Global Tables (active-active replication)
  • You're already in AWS ecosystem (Lambda + DynamoDB Streams + API Gateway)
  • Data exceeds what fits in memory but you still need fast key-based access

Choose Memcached When…

  • You need a simple, multi-threaded cache for small string/blob values
  • Workload is read-heavy and can benefit from multi-core utilization
  • You don't need persistence, data structures, or pub/sub
  • You want the simplest possible caching layer with minimal features
  • Large working set that benefits from Memcached's more efficient memory usage per key (no overhead for data structure metadata)

Common Architecture Patterns

  • Cache-aside + DynamoDB: Redis/Memcached fronting DynamoDB for hot reads
  • Redis as primary + PostgreSQL as source of truth: Fast reads from Redis, periodic sync to SQL for reporting
  • DynamoDB + Streams + Lambda: Event-driven architecture, DynamoDB as event store
  • Redis Streams as message broker: Between microservices when Kafka is overkill
  • DynamoDB + DAX + CloudFront: Full AWS caching stack for API responses
Interview framework: When asked "What database would you use?", start with: (1) What are the access patterns? (2) What's the consistency requirement? (3) What's the scale? (4) Is this ephemeral or durable data? If the answer is "fast key-based lookups, ephemeral, sub-ms latency" → Redis. If it's "durable, scalable, key-based access, managed" → DynamoDB. If it's "pure cache, simple values, multi-threaded" → Memcached.