High Level Design Series · Data Storage · Part 3

Key-Value Stores (Redis, DynamoDB)

April 2026 · 20 min read

The Key-Value Model

A key-value store is the simplest form of a database: it maps unique keys to opaque values. Think of it as a giant distributed hash map. The store doesn't inspect or index the values — all it guarantees is that given a key, it can return the associated value in O(1) amortized time.

The API surface is deliberately tiny:

GET(key)           → value | null
PUT(key, value)    → OK
DELETE(key)        → OK

This simplicity is the source of both its power and its limitations:

Power: Because the store doesn't need to maintain indexes, enforce schemas, or support joins, it can be optimized purely for speed and horizontal scalability. Lookups are O(1). Sharding is trivial — hash the key, pick a node.
Limitations: No secondary indexes (in pure KV stores), no range queries on values, no ad-hoc queries. If you need to find "all users in New York," you need a separate index or a different database.

Common use cases for key-value stores:

Use Case	Key Pattern	Value	Store
Session cache	`sess:{session_id}`	JSON blob	Redis
Shopping cart	`cart:{user_id}`	Hash of items	Redis
Rate limiting	`rl:{ip}:{window}`	Counter	Redis
User profile	`USER#123`	JSON document	DynamoDB
Feature flags	`flag:{feature_name}`	Boolean/JSON	Redis / etcd
Distributed lock	`lock:{resource}`	Owner token + TTL	Redis

Hash Table Internals

Every key-value store is ultimately built on top of a hash table. Understanding how hash tables work internally is critical for understanding why KV stores have O(1) lookups, how they handle collisions, and why resizing is expensive.

How a Hash Table Works

Hash Function: Take the key (e.g., "user:42") and pass it through a hash function (e.g., MurmurHash3, SipHash, CRC16) to produce a fixed-size integer.
Bucket Index: Compute hash(key) % num_buckets to get the index into an array of "buckets."
Collision Resolution: Multiple keys can hash to the same bucket. Two strategies:
- Chaining: Each bucket points to a linked list of (key, value) pairs. On collision, append to the list. Lookup scans the list (O(n) worst case, O(1) amortized with good hash functions).
- Open Addressing: On collision, probe the next slot (linear probing, quadratic probing, or double hashing). Redis uses chaining; DynamoDB's internal storage uses a variant of consistent hashing.
Load Factor: α = n / m where n = number of entries, m = number of buckets. When α exceeds a threshold (typically 0.75), the table resizes — usually doubling the number of buckets and rehashing all entries.

Redis specifically uses a technique called incremental rehashing: instead of rehashing all keys at once (which would block the single-threaded event loop), Redis maintains two hash tables simultaneously and gradually migrates keys from the old table to the new one, one bucket at a time, on each operation.

▶ Hash Table Lookup

Step through a key lookup with collision resolution via chaining.

Redis Hash Table Implementation

Redis's dictionary (dict.h) uses two hash tables (ht[0] and ht[1]) for incremental rehashing:

typedef struct dict {
    dictType *type;      // function pointers: hash, keyCompare, etc.
    dictht ht[2];        // ht[0] = current, ht[1] = resize target
    long rehashidx;      // -1 when not rehashing, else current bucket index
    int16_t pauserehash; // >0 means rehashing is paused
} dict;

typedef struct dictht {
    dictEntry **table;   // array of pointers to linked list heads
    unsigned long size;  // always a power of 2
    unsigned long sizemask; // size - 1 (for fast modulo via bitwise AND)
    unsigned long used;  // number of entries
} dictht;

Key implementation details:

Hash function: Redis uses SipHash (keyed hash, resistant to hash-flooding DoS attacks) for string keys.
Bucket index: idx = hash & ht->sizemask (bitwise AND is faster than modulo when size is a power of 2).
Rehash trigger: When used / size > 1 (load factor > 1, if no background save) or used / size > 5 (during BGSAVE/BGREWRITEAOF, more lenient to avoid copy-on-write overhead).
Incremental rehash: On each dictAdd, dictFind, or dictDelete, Redis migrates entries from one bucket of ht[0] to ht[1]. After all buckets are migrated, ht[1] becomes ht[0] and ht[1] is reset.

Redis Deep Dive

Redis (Remote Dictionary Server) is an in-memory data structure store that can serve as a database, cache, message broker, and streaming engine. It's single-threaded for command processing (using an event loop), which eliminates lock contention and gives it predictable sub-millisecond latency for most operations.

Key characteristics:

In-memory: All data lives in RAM. This is what gives Redis its speed — no disk I/O on reads.
Single-threaded command processing: One command at a time, no locks needed. I/O threads handle network reads/writes (since Redis 6.0).
Rich data structures: Not just strings — lists, sets, sorted sets, hashes, streams, bitmaps, HyperLogLog, and more.
Optional persistence: RDB snapshots and/or AOF (Append-Only File) for durability.
Throughput: ~100K-500K ops/sec on a single node (depending on command complexity and hardware).

Redis Data Structures

Redis is not a simple key-value store — it's a data structure server. Each value type has its own set of atomic operations. This is what makes Redis dramatically more useful than Memcached.

String

Binary-safe, up to 512 MB. The most basic type.

# Basic get/set
SET user:42:name "Alice"
GET user:42:name          → "Alice"

# Atomic counter
INCR page:views:home      → 1
INCRBY page:views:home 10 → 11

# Set with expiry (session token)
SET sess:abc123 "{user_id:42}" EX 3600

# Set only if not exists (distributed lock)
SET lock:order:99 "worker-1" NX PX 30000

# Bit operations
SETBIT user:42:features 3 1
BITCOUNT user:42:features → 1

List

Doubly linked list. O(1) push/pop at both ends.

# Task queue (producer)
LPUSH queue:emails "{to:'a@b.com'}"
LPUSH queue:emails "{to:'c@d.com'}"

# Task queue (consumer - blocking)
BRPOP queue:emails 30  → pops oldest

# Recent activity feed (capped list)
LPUSH feed:user:42 "liked post #123"
LTRIM feed:user:42 0 99  # keep last 100

# Range query
LRANGE feed:user:42 0 9  → last 10 items
LLEN queue:emails         → 2

Set

Unordered unique elements. O(1) add/remove/check.

# Track unique visitors
SADD visitors:2026-04-15 "user:42"
SADD visitors:2026-04-15 "user:43"
SCARD visitors:2026-04-15  → 2

# Tag system
SADD post:99:tags "redis" "nosql" "cache"
SMEMBERS post:99:tags
SISMEMBER post:99:tags "redis" → 1

# Set operations
SADD friends:alice "bob" "carol" "dave"
SADD friends:bob "alice" "carol" "eve"
SINTER friends:alice friends:bob  → "carol"
SUNION friends:alice friends:bob
SDIFF friends:alice friends:bob   → "dave"

Sorted Set (ZSet)

Set with scores. Skip list + hash table. O(log N) add.

# Leaderboard
ZADD leaderboard 2500 "alice"
ZADD leaderboard 1800 "bob"
ZADD leaderboard 3100 "carol"

# Top 10 players (descending)
ZREVRANGE leaderboard 0 9 WITHSCORES
# → "carol" 3100, "alice" 2500, "bob" 1800

# Rank lookup
ZREVRANK leaderboard "alice"  → 1 (0-indexed)
ZSCORE leaderboard "alice"    → "2500"

# Range by score (time-series window)
ZRANGEBYSCORE events 1714000000 1714100000

# Remove old entries
ZREMRANGEBYSCORE events -inf 1713900000

Hash

Field-value pairs inside a key. Like a mini document.

# User profile as hash
HSET user:42 name "Alice" email "a@b.com"
HSET user:42 login_count 0

HGET user:42 name           → "Alice"
HGETALL user:42
# → "name" "Alice" "email" "a@b.com"
#   "login_count" "0"

# Atomic field increment
HINCRBY user:42 login_count 1  → 1

# Partial update (only change email)
HSET user:42 email "alice@new.com"

# Memory efficient: hashes with <128 fields
# use ziplist encoding (contiguous memory)

Stream

Append-only log with consumer groups. Like Kafka-lite.

# Produce events
XADD orders * user_id 42 total 15990
# → "1714000000000-0" (auto-generated ID)

XADD orders * user_id 43 total 8500
# → "1714000000001-0"

# Read all events from beginning
XRANGE orders - +

# Consumer group (multiple workers)
XGROUP CREATE orders grp1 0
XREADGROUP GROUP grp1 worker-1 COUNT 10
  BLOCK 5000 STREAMS orders >

# Acknowledge processed messages
XACK orders grp1 "1714000000000-0"

# Pending entries (unacknowledged)
XPENDING orders grp1

HyperLogLog

Probabilistic cardinality estimation. ~12 KB per key. 0.81% std error.

# Count unique visitors (billions of elements,
# only 12 KB of memory!)
PFADD visitors:today "user:42" "user:43"
PFADD visitors:today "user:42"  # duplicate
PFCOUNT visitors:today  → 2

# Merge multiple periods
PFMERGE visitors:week
  visitors:mon visitors:tue visitors:wed

PFCOUNT visitors:week  → approximate unique count

Bitmap & Bitfield

Bit-level operations on strings. Ultra-compact for flags.

# Daily active users (1 bit per user ID)
SETBIT dau:2026-04-15 42 1
SETBIT dau:2026-04-15 43 1
BITCOUNT dau:2026-04-15  → 2

# Users active on BOTH days
BITOP AND active_both dau:2026-04-15
  dau:2026-04-16
BITCOUNT active_both  → count

# Bitfield: packed integer arrays
BITFIELD player:42 SET u8 #0 100  # HP = 100
BITFIELD player:42 SET u8 #1 50   # MP = 50
BITFIELD player:42 INCRBY u8 #0 -25 # damage

Memory optimization: Redis uses special compact encodings for small data structures. Hashes with fewer than hash-max-ziplist-entries (default 128) entries and values shorter than hash-max-ziplist-value (default 64 bytes) are stored as ziplists (contiguous memory, no pointers) — using ~10x less memory than the full hash table encoding. Similar optimizations exist for lists (quicklist = linked list of ziplists), sets (intset for small integer sets), and sorted sets (ziplist for small sets).

Persistence: RDB vs AOF

Redis is in-memory, but that doesn't mean your data vanishes on restart. Redis offers two persistence mechanisms, and you can use both simultaneously:

RDB (Redis Database Snapshots)

Point-in-time snapshots of the entire dataset, saved as a compact binary file.

# redis.conf
save 900 1      # snapshot if ≥1 key changed in 900s
save 300 10     # snapshot if ≥10 keys changed in 300s
save 60 10000   # snapshot if ≥10K keys changed in 60s

# Manual trigger
BGSAVE          # fork() → child writes dump.rdb
SAVE            # blocks main thread (never in prod!)

# How it works:
# 1. Redis calls fork() to create child process
# 2. Child writes entire dataset to temp file
# 3. Child atomically renames temp → dump.rdb
# 4. Parent continues serving requests
# 5. Copy-on-write: only modified pages duplicated

Pro: Compact file, fast restarts, good for backups
Pro: Child process does the I/O — zero impact on latency
Con: Data loss between snapshots (up to minutes)
Con: fork() can be slow with large datasets (copies page tables)

AOF (Append-Only File)

Logs every write command. Replay the log to reconstruct state.

# redis.conf
appendonly yes
appendfsync everysec   # fsync once per second (default)
# appendfsync always   # fsync on every write (safest)
# appendfsync no       # let OS decide (fastest)

# AOF file looks like:
*3\r\n$3\r\nSET\r\n$7\r\nuser:42\r\n$5\r\nAlice\r\n

# AOF rewrite (compaction):
BGREWRITEAOF
# Rewrites AOF with minimum commands to
# reconstruct current state. E.g., 1000 INCRs
# become a single SET with final value.

# Redis 7.0+: Multi-part AOF
# base.aof (RDB-format snapshot) +
# incr.aof (incremental commands since snapshot)

Pro: Minimal data loss (at most 1 second with everysec)
Pro: Append-only = no corruption risk from partial writes
Con: Larger file size than RDB
Con: Slower restarts (must replay all commands)

Recommended production config: Enable both RDB and AOF. Redis will use AOF for recovery (more complete data) and RDB for backups and faster cold starts. The Redis 7.0 multi-part AOF essentially combines the best of both — an RDB base with incremental AOF on top.

Persistence Decision Matrix

Requirement	Use	Config
Pure cache, no durability needed	None	`save ""`, `appendonly no`
Accept a few minutes of data loss	RDB only	`save 300 10`
Accept at most ~1 second of loss	AOF everysec	`appendfsync everysec`
Zero data loss (rare for Redis)	AOF always	`appendfsync always` (~10x slower)
Best of both worlds (recommended)	RDB + AOF	Both enabled, AOF used for recovery

Redis Cluster

A single Redis node can hold ~25 GB of data comfortably (limited by RAM). For larger datasets or higher throughput, you need Redis Cluster — Redis's built-in sharding solution that distributes data across multiple nodes.

How Hash Slots Work

Redis Cluster divides the entire keyspace into 16,384 hash slots (numbered 0–16,383). Each key is assigned to a slot using:

slot = CRC16(key) mod 16384

Each master node in the cluster owns a subset of these slots. With 3 masters:

Node A: slots 0–5460
Node B: slots 5461–10922
Node C: slots 10923–16383

Each master has one or more replicas for failover. A typical production cluster runs 6 nodes: 3 masters + 3 replicas.

Hash Tags for Multi-Key Operations

Since multi-key commands (MGET, pipeline transactions) only work when all keys hash to the same slot, Redis supports hash tags: only the substring inside {...} is hashed.

# These all hash to the same slot (hash of "user:42")
SET {user:42}:profile "..."
SET {user:42}:settings "..."
SET {user:42}:cart "..."

# Now MGET works across all three
MGET {user:42}:profile {user:42}:settings {user:42}:cart

# Without hash tags, these could land on different nodes
# → CROSSSLOT error on multi-key commands

MOVED Redirects

When a client sends a command to the wrong node, the node responds with a MOVED redirect:

# Client sends to Node A:
GET user:500
# Node A calculates: CRC16("user:500") % 16384 = 7365
# 7365 is owned by Node B, not Node A

# Node A responds:
-MOVED 7365 192.168.1.2:6379

# Smart clients (like redis-py-cluster, Lettuce, Jedis)
# cache the slot→node mapping and send future requests
# for slot 7365 directly to Node B.

Resharding & Rebalancing

Adding a new node to the cluster:

# 1. Add Node D to the cluster
redis-cli --cluster add-node 192.168.1.4:6379 192.168.1.1:6379

# 2. Reshard: move slots from existing nodes to Node D
redis-cli --cluster reshard 192.168.1.1:6379
# → Interactively choose how many slots (e.g., 4096)
# → Choose source nodes (A, B, C) and target (D)

# During resharding:
# - Keys in migrating slots trigger ASK redirect
# - Client follows ASK to new node for that key
# - Once migration is complete, MOVED is returned instead
# - Entire process is online — no downtime

▶ Redis Cluster Hash Slot Routing

Watch how a key is routed to the correct node via CRC16 hashing, including a MOVED redirect.

Redis Sentinel

Redis Sentinel provides high availability for standalone Redis deployments (non-cluster mode). It handles automatic failover, monitoring, and service discovery.

How Sentinel works:

Monitoring: Sentinel continuously pings the master and replicas. If a master doesn't respond within down-after-milliseconds, the sentinel marks it as subjectively down (SDOWN).
Quorum: When enough sentinels (a configurable quorum, typically 2 out of 3) agree the master is down, it's marked objectively down (ODOWN).
Failover: Sentinels elect a leader who promotes the best replica to master. "Best" = most up-to-date replication offset.
Notification: Clients subscribed to Sentinel are notified of the new master's address. Smart clients (Lettuce, redis-py with Sentinel support) reconnect automatically.

# sentinel.conf
sentinel monitor mymaster 192.168.1.1 6379 2  # quorum=2
sentinel down-after-milliseconds mymaster 5000
sentinel failover-timeout mymaster 60000
sentinel parallel-syncs mymaster 1

# Query Sentinel for current master
redis-cli -p 26379 SENTINEL get-master-addr-by-name mymaster
# → "192.168.1.2" "6379" (after failover)

Sentinel vs Cluster: Sentinel provides HA for a single-shard deployment (one dataset, replicated). Redis Cluster provides HA and sharding (multiple shards, each with replicas). For datasets that fit on one node, Sentinel is simpler. For larger data or higher throughput, use Cluster.

Pub/Sub & Streams

Pub/Sub (Fire-and-Forget)

Redis Pub/Sub is a simple messaging system where publishers send messages to channels and all subscribed clients receive them in real-time. Messages are not persisted — if no subscriber is listening, the message is lost.

# Publisher
PUBLISH chat:room:42 "Hello from Alice!"
# → (integer) 2  (2 subscribers received it)

# Subscriber (blocking)
SUBSCRIBE chat:room:42
# Blocks and receives messages as they arrive:
# 1) "message"
# 2) "chat:room:42"
# 3) "Hello from Alice!"

# Pattern subscribe (wildcard)
PSUBSCRIBE chat:room:*
# Receives messages from ALL chat rooms

Streams (Persistent, Consumer Groups)

For durable messaging with consumer groups, acknowledgment, and replay — use Redis Streams (added in Redis 5.0). Streams solve every limitation of Pub/Sub:

Feature	Pub/Sub	Streams
Persistence	❌ Fire and forget	✅ Stored on disk
Consumer groups	❌ All get every msg	✅ Load-balanced across group
Acknowledgment	❌ No ack	✅ XACK per message
Replay	❌ Missed = lost	✅ Read from any ID
Backpressure	❌ Slow client drops	✅ Pending list tracks unprocessed

Lua Scripting

Redis executes Lua scripts atomically — the entire script runs without any other command interleaving. This is the key to implementing complex atomic operations that can't be done with a single Redis command.

# Rate limiter using sliding window (Lua script)
# KEYS[1] = rate limit key, ARGV[1] = window (sec),
# ARGV[2] = max requests, ARGV[3] = current timestamp

local key = KEYS[1]
local window = tonumber(ARGV[1])
local max_req = tonumber(ARGV[2])
local now = tonumber(ARGV[3])

-- Remove entries outside the window
redis.call('ZREMRANGEBYSCORE', key, '-inf', now - window)

-- Count requests in current window
local count = redis.call('ZCARD', key)

if count < max_req then
    -- Under limit: add this request
    redis.call('ZADD', key, now, now .. '-' .. math.random(1000000))
    redis.call('EXPIRE', key, window)
    return 1  -- allowed
else
    return 0  -- rate limited
end

# Execute the script from a client
redis-cli EVAL "$(cat rate_limit.lua)" 1 \
  "rl:api:192.168.1.1" 60 100 1714000000
# → (integer) 1  (allowed)

# Or use EVALSHA with cached script hash
redis-cli SCRIPT LOAD "$(cat rate_limit.lua)"
# → "a1b2c3d4e5f6..."

redis-cli EVALSHA "a1b2c3d4e5f6..." 1 \
  "rl:api:192.168.1.1" 60 100 1714000000

Other common Lua script use cases:

Compare-and-swap: Read a value, check a condition, write atomically.
Distributed lock release: Only delete the lock if you own it (check token, then delete).
Inventory reservation: Decrement stock only if sufficient quantity exists.
Leaderboard update: Update score and trim to top-N atomically.

# Safe distributed lock release (Redlock pattern)
# Only release if we still own the lock
if redis.call('GET', KEYS[1]) == ARGV[1] then
    return redis.call('DEL', KEYS[1])
else
    return 0  -- someone else holds it now
end

DynamoDB Deep Dive

Amazon DynamoDB is a fully managed, serverless key-value and document database that delivers single-digit millisecond performance at any scale. Unlike Redis (which you operate yourself), DynamoDB is a service — AWS handles replication, sharding, patching, and scaling.

Key characteristics:

Serverless: No servers to provision, patch, or manage. Auto-scales up and down.
Consistent performance: Single-digit ms latency whether your table has 1 KB or 1 PB of data.
Multi-region: Global Tables replicate across AWS regions with sub-second replication lag.
Flexible consistency: Choose per-query: strongly consistent reads or eventually consistent reads (2x cheaper).
SLA: 99.999% availability for Global Tables, 99.99% for standard tables.

Partition Key & Sort Key

Every DynamoDB table has a primary key that uniquely identifies each item. There are two types:

Simple Primary Key (Partition Key Only)

The partition key alone must be unique. DynamoDB hashes it to determine which physical partition stores the item.

# Table: Users
# PK: user_id (String)

{
  "user_id": "u-42",       ← partition key
  "name": "Alice",
  "email": "alice@ex.com",
  "plan": "premium"
}

# Access patterns:
# ✅ GetItem(user_id = "u-42")    → O(1)
# ❌ Query(email = "alice@...")   → full scan!

Composite Primary Key (PK + Sort Key)

Partition key + sort key together are unique. Enables range queries within a partition.

# Table: Orders
# PK: user_id (String), SK: order_id (String)

{
  "user_id": "u-42",       ← partition key
  "order_id": "ord-001",   ← sort key
  "total": 15990,
  "status": "shipped"
}

# Access patterns:
# ✅ GetItem(PK="u-42", SK="ord-001")
# ✅ Query(PK="u-42", SK begins_with "ord-")
# ✅ Query(PK="u-42", SK between "A" and "Z")
# ✅ Query(PK="u-42") → all orders for user

Partition key design is everything. A bad partition key causes hot partitions — all traffic concentrated on one physical partition while others sit idle. Good partition keys have high cardinality and uniform access distribution.

Good PK	Bad PK	Why
`user_id`	`country`	Millions of users vs. ~200 countries → hot partition on "US"
`device_id`	`date`	All IoT writes go to today's partition
`order_id` (UUID)	`status`	Only 5 statuses → extreme imbalance

Global & Local Secondary Indexes

DynamoDB's primary key determines the only efficient access pattern on the base table. For additional access patterns, you need secondary indexes.

Local Secondary Index (LSI)

Same partition key as the base table, different sort key. Must be created at table creation time. Limit: 5 per table.

# Base table: Orders
# PK: user_id, SK: order_id

# LSI: OrdersByDate
# PK: user_id (same), SK: created_at

# Now you can query:
# "All orders for user u-42, sorted by date"
response = table.query(
    IndexName='OrdersByDate',
    KeyConditionExpression=
        Key('user_id').eq('u-42') &
        Key('created_at').between(
            '2026-01-01', '2026-04-30'
        )
)

Strongly consistent reads ✅
Shares throughput with base table
10 GB limit per partition key value

Global Secondary Index (GSI)

Completely different partition key and/or sort key. Can be created anytime. Limit: 20 per table.

# Base table: Orders
# PK: user_id, SK: order_id

# GSI: OrdersByStatus
# PK: status, SK: created_at

# Now you can query:
# "All pending orders, newest first"
response = table.query(
    IndexName='OrdersByStatus',
    KeyConditionExpression=
        Key('status').eq('pending'),
    ScanIndexForward=False  # descending
)

# GSI: OrdersByEmail
# PK: email, SK: user_id
# "Find user by email" → efficient lookup

Eventually consistent reads only ❌
Has its own provisioned throughput
Asynchronously replicated from base

Single-Table Design Patterns

Single-table design is a DynamoDB modeling technique where all entity types live in one table, using generic key names (PK, SK) with prefixed values to differentiate entities. This is the recommended approach from AWS for most applications.

Why? DynamoDB charges per table (provisioned capacity) and per-request (on-demand). Fewer tables = fewer things to manage, and you can fetch related entities in a single Query operation.

# Single-table design for an e-commerce app
# Table: ECommerce
# PK (String) | SK (String) | Attributes...

# User entity
PK=USER#u-42         SK=PROFILE           name=Alice email=alice@ex.com plan=premium

# User's orders (all in one partition → one Query)
PK=USER#u-42         SK=ORDER#2026-04-15#ord-001  total=15990 status=shipped
PK=USER#u-42         SK=ORDER#2026-04-10#ord-002  total=8500  status=delivered

# Order items (nested under order)
PK=ORDER#ord-001     SK=ITEM#sku-101              qty=1 price=9999
PK=ORDER#ord-001     SK=ITEM#sku-205              qty=2 price=2995

# Product entity
PK=PRODUCT#sku-101   SK=METADATA                  name="Widget" category="electronics"
PK=PRODUCT#sku-101   SK=REVIEW#u-43               rating=5 text="Great product!"

# GSI1: Inverted index (SK → PK)
# GSI1PK=ORDER#2026-04-15#ord-001  GSI1SK=USER#u-42
# → "Which user placed order ord-001?"

# GSI2: By status
# GSI2PK=shipped  GSI2SK=2026-04-15#ord-001
# → "All shipped orders, sorted by date"

Query Examples (Python / boto3)

import boto3
from boto3.dynamodb.conditions import Key

dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('ECommerce')

# 1. Get user profile
profile = table.get_item(
    Key={'PK': 'USER#u-42', 'SK': 'PROFILE'}
)['Item']

# 2. Get all orders for a user (sorted by date)
orders = table.query(
    KeyConditionExpression=
        Key('PK').eq('USER#u-42') &
        Key('SK').begins_with('ORDER#')
)['Items']

# 3. Get orders from last 30 days
recent = table.query(
    KeyConditionExpression=
        Key('PK').eq('USER#u-42') &
        Key('SK').between('ORDER#2026-03-16', 'ORDER#2026-04-16')
)['Items']

# 4. Get all items in an order
items = table.query(
    KeyConditionExpression=
        Key('PK').eq('ORDER#ord-001') &
        Key('SK').begins_with('ITEM#')
)['Items']

# 5. Batch get multiple entities (up to 100 items)
result = dynamodb.batch_get_item(
    RequestItems={
        'ECommerce': {
            'Keys': [
                {'PK': 'USER#u-42', 'SK': 'PROFILE'},
                {'PK': 'USER#u-43', 'SK': 'PROFILE'},
                {'PK': 'PRODUCT#sku-101', 'SK': 'METADATA'}
            ]
        }
    }
)

# 6. Transactional write (atomic across items)
dynamodb.meta.client.transact_write_items(
    TransactItems=[
        {'Put': {'TableName': 'ECommerce', 'Item': {
            'PK': 'USER#u-42',
            'SK': 'ORDER#2026-04-16#ord-003',
            'total': 5000, 'status': 'created'
        }}},
        {'Update': {'TableName': 'ECommerce',
            'Key': {'PK': 'PRODUCT#sku-101', 'SK': 'METADATA'},
            'UpdateExpression': 'SET stock = stock - :qty',
            'ConditionExpression': 'stock >= :qty',
            'ExpressionAttributeValues': {':qty': 1}
        }}
    ]
)

Capacity Planning

DynamoDB has two capacity modes. Understanding the math is critical for cost estimation in system design interviews.

Read Capacity Units (RCU)

1 RCU = one strongly consistent read of up to 4 KB
1 RCU = two eventually consistent reads of up to 4 KB
Transactional reads cost 2 RCUs per 4 KB read

Write Capacity Units (WCU)

1 WCU = one write of up to 1 KB
Transactional writes cost 2 WCUs per 1 KB write

Capacity Planning Example

Scenario: E-commerce platform with 10,000 DAU, peak 500 requests/sec.

# Read calculations:
# - Average item size: 2 KB
# - 80% reads are eventually consistent
# - 20% reads are strongly consistent
# - Peak: 400 reads/sec

Eventually consistent: 400 × 0.8 = 320 reads/sec
  Each 2 KB item = ceil(2/4) = 1 RCU per strongly consistent read
  Eventually consistent = 0.5 RCU per read
  → 320 × 0.5 = 160 RCUs

Strongly consistent: 400 × 0.2 = 80 reads/sec
  → 80 × 1 = 80 RCUs

Total read capacity: 160 + 80 = 240 RCUs

# Write calculations:
# - Average item size: 1.5 KB → ceil(1.5/1) = 2 WCUs per write
# - Peak: 100 writes/sec

Total write capacity: 100 × 2 = 200 WCUs

# Cost estimate (us-east-1, provisioned):
# RCU: $0.00013 per RCU per hour
# WCU: $0.00065 per WCU per hour
#
# Monthly (730 hours):
# Reads:  240 × $0.00013 × 730 = $22.78/month
# Writes: 200 × $0.00065 × 730 = $94.90/month
# Total: ~$118/month (provisioned)
#
# On-demand pricing:
# $1.25 per million write request units
# $0.25 per million read request units
# At 400 reads/sec × 86400 sec = 34.56M reads/day
# At 100 writes/sec × 86400 sec = 8.64M writes/day
# → ~$8.64 + ~$10.80 = ~$19.44/day = ~$583/month
# → Provisioned is cheaper at sustained load!

Adaptive Capacity

DynamoDB automatically handles uneven workloads with adaptive capacity. Even if one partition is receiving disproportionate traffic (hot partition), DynamoDB will:

Burst capacity: Each partition can burst up to 300 seconds of unused throughput.
Adaptive capacity: Automatically increases throughput for hot partitions by "borrowing" unused capacity from other partitions, as long as total table throughput isn't exceeded.
Partition splitting: If a partition is consistently hot, DynamoDB will automatically split it into two partitions. This is transparent and requires no action.

Interview tip: When asked about DynamoDB capacity, mention three things: (1) the RCU/WCU math, (2) provisioned vs on-demand trade-offs (sustained load → provisioned; spiky → on-demand), and (3) adaptive capacity + auto-scaling as mitigations for hot partitions.

DynamoDB Streams & DAX

DynamoDB Streams

DynamoDB Streams captures a time-ordered sequence of item-level modifications (inserts, updates, deletes) in a DynamoDB table. Each stream record contains the primary key, the old/new image (or both), and a timestamp.

# Enable streams on a table
aws dynamodb update-table \
    --table-name ECommerce \
    --stream-specification \
        StreamEnabled=true,StreamViewType=NEW_AND_OLD_IMAGES

# Stream view types:
# KEYS_ONLY        → only the key attributes
# NEW_IMAGE        → the entire item after modification
# OLD_IMAGE        → the entire item before modification
# NEW_AND_OLD_IMAGES → both (most useful)

# Common use cases:
# 1. Trigger Lambda on item change (event-driven)
# 2. Cross-region replication (Global Tables use this)
# 3. Materialized views / denormalization
# 4. Analytics pipeline: Stream → Kinesis → S3 → Athena
# 5. Audit log / change data capture (CDC)

Example: Auto-update a search index when a product changes:

# Lambda function triggered by DynamoDB Stream
def handler(event, context):
    for record in event['Records']:
        if record['eventName'] in ('INSERT', 'MODIFY'):
            new_image = record['dynamodb']['NewImage']
            pk = new_image['PK']['S']
            
            if pk.startswith('PRODUCT#'):
                # Index in Elasticsearch
                es_client.index(
                    index='products',
                    id=pk,
                    body={
                        'name': new_image['name']['S'],
                        'category': new_image['category']['S'],
                        'price': int(new_image['price']['N'])
                    }
                )
        
        elif record['eventName'] == 'REMOVE':
            old_image = record['dynamodb']['OldImage']
            pk = old_image['PK']['S']
            if pk.startswith('PRODUCT#'):
                es_client.delete(index='products', id=pk)

DAX (DynamoDB Accelerator)

DAX is a fully managed, in-memory caching layer for DynamoDB that delivers microsecond response times (vs. single-digit millisecond for vanilla DynamoDB). It's API-compatible — just change the endpoint.

# Without DAX (standard DynamoDB)
import boto3
dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('ECommerce')
result = table.get_item(Key={'PK': 'USER#u-42', 'SK': 'PROFILE'})
# Latency: ~5 ms

# With DAX (just change the client!)
import amazondax
dax_client = amazondax.AmazonDaxClient(
    endpoint_url='dax://my-cluster.abc123.dax-clusters.us-east-1.amazonaws.com:8111'
)
table = dax_client.Table('ECommerce')
result = table.get_item(Key={'PK': 'USER#u-42', 'SK': 'PROFILE'})
# Latency: ~0.2 ms (25x faster)

# DAX caching behavior:
# - Item cache: caches GetItem/BatchGetItem results
# - Query cache: caches Query/Scan results
# - Write-through: writes go to both DAX and DynamoDB
# - TTL: default 5 minutes (configurable)
# - Cache invalidation: automatic on writes

When to use DAX vs. Redis as a DynamoDB cache:

Criteria	DAX	Redis
Integration	Drop-in (same API)	Requires cache-aside code
Data structures	None (just cached items)	Rich (sorted sets, lists, etc.)
Cache invalidation	Automatic write-through	Manual (you manage TTL/eviction)
Cost	~$0.27/hr per node	Varies (ElastiCache pricing)
Use case	Read-heavy DynamoDB only	Multi-source caching, pub/sub, etc.

Redis vs DynamoDB vs Memcached

These three are often confused or conflated in system design discussions. They serve very different purposes:

Dimension	Redis	DynamoDB	Memcached
Type	In-memory data structure store	Managed NoSQL (SSD-backed)	In-memory cache (strings only)
Data model	Strings, Lists, Sets, ZSets, Hashes, Streams, HLL, Bitmaps	Items (JSON-like) with typed attributes	Binary strings (blobs) only
Max value size	512 MB	400 KB per item	1 MB (default)
Latency	Sub-ms (in-memory)	1–5 ms (SSD); μs with DAX	Sub-ms (in-memory)
Persistence	Optional (RDB + AOF)	Always durable (3-AZ replication)	None (pure cache)
Consistency	Strong (single node) / eventual (replicas)	Strong or eventually consistent (per query)	N/A (no replication)
Scaling	Redis Cluster (manual sharding)	Automatic (serverless)	Client-side sharding
Secondary indexes	Manual (sorted sets as indexes)	GSI + LSI (up to 25 total)	None
Transactions	MULTI/EXEC (optimistic locking)	TransactWriteItems / TransactGetItems (ACID, up to 100 items)	CAS (check and set) only
Scripting	Lua scripts (atomic)	ConditionExpressions, UpdateExpressions	None
Pub/Sub	Yes (Pub/Sub + Streams)	DynamoDB Streams (CDC)	No
Multi-threading	Single-threaded cmd processing (I/O threads since 6.0)	Managed (transparent)	Multi-threaded
Ops burden	Medium (you manage cluster)	Low (fully managed)	Medium (you manage servers)
Pricing	Per node (RAM-based)	Per RCU/WCU or per request	Per node (RAM-based)

When to Use Each

Choose Redis When…

Sub-millisecond latency is required (caching, session store, rate limiting)
You need rich data structures — leaderboards (sorted sets), queues (lists), unique counts (HyperLogLog)
You want atomic operations on complex data (Lua scripts, MULTI/EXEC)
Pub/Sub or Streams for real-time messaging within a service
Data fits in memory (RAM is the limit)
You can tolerate some data loss (even with AOF everysec, 1 second window)
Use as a complement to a primary database, not as the primary database itself (unless ephemeral data)

Choose DynamoDB When…

You need a durable, scalable primary database with consistent single-digit ms latency
Access patterns are known and limited (design your keys/indexes upfront)
You want zero operational overhead — no servers, patches, backups to manage
Auto-scaling for unpredictable traffic (on-demand mode)
Multi-region with Global Tables (active-active replication)
You're already in AWS ecosystem (Lambda + DynamoDB Streams + API Gateway)
Data exceeds what fits in memory but you still need fast key-based access

Choose Memcached When…

You need a simple, multi-threaded cache for small string/blob values
Workload is read-heavy and can benefit from multi-core utilization
You don't need persistence, data structures, or pub/sub
You want the simplest possible caching layer with minimal features
Large working set that benefits from Memcached's more efficient memory usage per key (no overhead for data structure metadata)

Common Architecture Patterns

Cache-aside + DynamoDB: Redis/Memcached fronting DynamoDB for hot reads
Redis as primary + PostgreSQL as source of truth: Fast reads from Redis, periodic sync to SQL for reporting
DynamoDB + Streams + Lambda: Event-driven architecture, DynamoDB as event store
Redis Streams as message broker: Between microservices when Kafka is overkill
DynamoDB + DAX + CloudFront: Full AWS caching stack for API responses

Interview framework: When asked "What database would you use?", start with: (1) What are the access patterns? (2) What's the consistency requirement? (3) What's the scale? (4) Is this ephemeral or durable data? If the answer is "fast key-based lookups, ephemeral, sub-ms latency" → Redis. If it's "durable, scalable, key-based access, managed" → DynamoDB. If it's "pure cache, simple values, multi-threaded" → Memcached.