Key-Value Stores (Redis, DynamoDB)
The Key-Value Model
A key-value store is the simplest form of a database: it maps unique keys to opaque values. Think of it as a giant distributed hash map. The store doesn't inspect or index the values — all it guarantees is that given a key, it can return the associated value in O(1) amortized time.
The API surface is deliberately tiny:
GET(key) → value | null
PUT(key, value) → OK
DELETE(key) → OK
This simplicity is the source of both its power and its limitations:
- Power: Because the store doesn't need to maintain indexes, enforce schemas, or support joins, it can be optimized purely for speed and horizontal scalability. Lookups are O(1). Sharding is trivial — hash the key, pick a node.
- Limitations: No secondary indexes (in pure KV stores), no range queries on values, no ad-hoc queries. If you need to find "all users in New York," you need a separate index or a different database.
Common use cases for key-value stores:
| Use Case | Key Pattern | Value | Store |
|---|---|---|---|
| Session cache | sess:{session_id} | JSON blob | Redis |
| Shopping cart | cart:{user_id} | Hash of items | Redis |
| Rate limiting | rl:{ip}:{window} | Counter | Redis |
| User profile | USER#123 | JSON document | DynamoDB |
| Feature flags | flag:{feature_name} | Boolean/JSON | Redis / etcd |
| Distributed lock | lock:{resource} | Owner token + TTL | Redis |
Hash Table Internals
Every key-value store is ultimately built on top of a hash table. Understanding how hash tables work internally is critical for understanding why KV stores have O(1) lookups, how they handle collisions, and why resizing is expensive.
How a Hash Table Works
- Hash Function: Take the key (e.g.,
"user:42") and pass it through a hash function (e.g., MurmurHash3, SipHash, CRC16) to produce a fixed-size integer. - Bucket Index: Compute
hash(key) % num_bucketsto get the index into an array of "buckets." - Collision Resolution: Multiple keys can hash to the same bucket. Two strategies:
- Chaining: Each bucket points to a linked list of (key, value) pairs. On collision, append to the list. Lookup scans the list (O(n) worst case, O(1) amortized with good hash functions).
- Open Addressing: On collision, probe the next slot (linear probing, quadratic probing, or double hashing). Redis uses chaining; DynamoDB's internal storage uses a variant of consistent hashing.
- Load Factor:
α = n / mwheren= number of entries,m= number of buckets. When α exceeds a threshold (typically 0.75), the table resizes — usually doubling the number of buckets and rehashing all entries.
Redis specifically uses a technique called incremental rehashing: instead of rehashing all keys at once (which would block the single-threaded event loop), Redis maintains two hash tables simultaneously and gradually migrates keys from the old table to the new one, one bucket at a time, on each operation.
▶ Hash Table Lookup
Step through a key lookup with collision resolution via chaining.
Redis Hash Table Implementation
Redis's dictionary (dict.h) uses two hash tables (ht[0] and ht[1]) for incremental rehashing:
typedef struct dict {
dictType *type; // function pointers: hash, keyCompare, etc.
dictht ht[2]; // ht[0] = current, ht[1] = resize target
long rehashidx; // -1 when not rehashing, else current bucket index
int16_t pauserehash; // >0 means rehashing is paused
} dict;
typedef struct dictht {
dictEntry **table; // array of pointers to linked list heads
unsigned long size; // always a power of 2
unsigned long sizemask; // size - 1 (for fast modulo via bitwise AND)
unsigned long used; // number of entries
} dictht;
Key implementation details:
- Hash function: Redis uses SipHash (keyed hash, resistant to hash-flooding DoS attacks) for string keys.
- Bucket index:
idx = hash & ht->sizemask(bitwise AND is faster than modulo when size is a power of 2). - Rehash trigger: When
used / size > 1(load factor > 1, if no background save) orused / size > 5(during BGSAVE/BGREWRITEAOF, more lenient to avoid copy-on-write overhead). - Incremental rehash: On each
dictAdd,dictFind, ordictDelete, Redis migrates entries from one bucket ofht[0]toht[1]. After all buckets are migrated,ht[1]becomesht[0]andht[1]is reset.
Redis Deep Dive
Redis (Remote Dictionary Server) is an in-memory data structure store that can serve as a database, cache, message broker, and streaming engine. It's single-threaded for command processing (using an event loop), which eliminates lock contention and gives it predictable sub-millisecond latency for most operations.
Key characteristics:
- In-memory: All data lives in RAM. This is what gives Redis its speed — no disk I/O on reads.
- Single-threaded command processing: One command at a time, no locks needed. I/O threads handle network reads/writes (since Redis 6.0).
- Rich data structures: Not just strings — lists, sets, sorted sets, hashes, streams, bitmaps, HyperLogLog, and more.
- Optional persistence: RDB snapshots and/or AOF (Append-Only File) for durability.
- Throughput: ~100K-500K ops/sec on a single node (depending on command complexity and hardware).
Redis Data Structures
Redis is not a simple key-value store — it's a data structure server. Each value type has its own set of atomic operations. This is what makes Redis dramatically more useful than Memcached.
String
Binary-safe, up to 512 MB. The most basic type.
# Basic get/set
SET user:42:name "Alice"
GET user:42:name → "Alice"
# Atomic counter
INCR page:views:home → 1
INCRBY page:views:home 10 → 11
# Set with expiry (session token)
SET sess:abc123 "{user_id:42}" EX 3600
# Set only if not exists (distributed lock)
SET lock:order:99 "worker-1" NX PX 30000
# Bit operations
SETBIT user:42:features 3 1
BITCOUNT user:42:features → 1
List
Doubly linked list. O(1) push/pop at both ends.
# Task queue (producer)
LPUSH queue:emails "{to:'a@b.com'}"
LPUSH queue:emails "{to:'c@d.com'}"
# Task queue (consumer - blocking)
BRPOP queue:emails 30 → pops oldest
# Recent activity feed (capped list)
LPUSH feed:user:42 "liked post #123"
LTRIM feed:user:42 0 99 # keep last 100
# Range query
LRANGE feed:user:42 0 9 → last 10 items
LLEN queue:emails → 2
Set
Unordered unique elements. O(1) add/remove/check.
# Track unique visitors
SADD visitors:2026-04-15 "user:42"
SADD visitors:2026-04-15 "user:43"
SCARD visitors:2026-04-15 → 2
# Tag system
SADD post:99:tags "redis" "nosql" "cache"
SMEMBERS post:99:tags
SISMEMBER post:99:tags "redis" → 1
# Set operations
SADD friends:alice "bob" "carol" "dave"
SADD friends:bob "alice" "carol" "eve"
SINTER friends:alice friends:bob → "carol"
SUNION friends:alice friends:bob
SDIFF friends:alice friends:bob → "dave"
Sorted Set (ZSet)
Set with scores. Skip list + hash table. O(log N) add.
# Leaderboard
ZADD leaderboard 2500 "alice"
ZADD leaderboard 1800 "bob"
ZADD leaderboard 3100 "carol"
# Top 10 players (descending)
ZREVRANGE leaderboard 0 9 WITHSCORES
# → "carol" 3100, "alice" 2500, "bob" 1800
# Rank lookup
ZREVRANK leaderboard "alice" → 1 (0-indexed)
ZSCORE leaderboard "alice" → "2500"
# Range by score (time-series window)
ZRANGEBYSCORE events 1714000000 1714100000
# Remove old entries
ZREMRANGEBYSCORE events -inf 1713900000
Hash
Field-value pairs inside a key. Like a mini document.
# User profile as hash
HSET user:42 name "Alice" email "a@b.com"
HSET user:42 login_count 0
HGET user:42 name → "Alice"
HGETALL user:42
# → "name" "Alice" "email" "a@b.com"
# "login_count" "0"
# Atomic field increment
HINCRBY user:42 login_count 1 → 1
# Partial update (only change email)
HSET user:42 email "alice@new.com"
# Memory efficient: hashes with <128 fields
# use ziplist encoding (contiguous memory)
Stream
Append-only log with consumer groups. Like Kafka-lite.
# Produce events
XADD orders * user_id 42 total 15990
# → "1714000000000-0" (auto-generated ID)
XADD orders * user_id 43 total 8500
# → "1714000000001-0"
# Read all events from beginning
XRANGE orders - +
# Consumer group (multiple workers)
XGROUP CREATE orders grp1 0
XREADGROUP GROUP grp1 worker-1 COUNT 10
BLOCK 5000 STREAMS orders >
# Acknowledge processed messages
XACK orders grp1 "1714000000000-0"
# Pending entries (unacknowledged)
XPENDING orders grp1
HyperLogLog
Probabilistic cardinality estimation. ~12 KB per key. 0.81% std error.
# Count unique visitors (billions of elements,
# only 12 KB of memory!)
PFADD visitors:today "user:42" "user:43"
PFADD visitors:today "user:42" # duplicate
PFCOUNT visitors:today → 2
# Merge multiple periods
PFMERGE visitors:week
visitors:mon visitors:tue visitors:wed
PFCOUNT visitors:week → approximate unique count
Bitmap & Bitfield
Bit-level operations on strings. Ultra-compact for flags.
# Daily active users (1 bit per user ID)
SETBIT dau:2026-04-15 42 1
SETBIT dau:2026-04-15 43 1
BITCOUNT dau:2026-04-15 → 2
# Users active on BOTH days
BITOP AND active_both dau:2026-04-15
dau:2026-04-16
BITCOUNT active_both → count
# Bitfield: packed integer arrays
BITFIELD player:42 SET u8 #0 100 # HP = 100
BITFIELD player:42 SET u8 #1 50 # MP = 50
BITFIELD player:42 INCRBY u8 #0 -25 # damage
hash-max-ziplist-entries (default 128) entries and values shorter than hash-max-ziplist-value (default 64 bytes) are stored as ziplists (contiguous memory, no pointers) — using ~10x less memory than the full hash table encoding. Similar optimizations exist for lists (quicklist = linked list of ziplists), sets (intset for small integer sets), and sorted sets (ziplist for small sets).
Persistence: RDB vs AOF
Redis is in-memory, but that doesn't mean your data vanishes on restart. Redis offers two persistence mechanisms, and you can use both simultaneously:
RDB (Redis Database Snapshots)
Point-in-time snapshots of the entire dataset, saved as a compact binary file.
# redis.conf
save 900 1 # snapshot if ≥1 key changed in 900s
save 300 10 # snapshot if ≥10 keys changed in 300s
save 60 10000 # snapshot if ≥10K keys changed in 60s
# Manual trigger
BGSAVE # fork() → child writes dump.rdb
SAVE # blocks main thread (never in prod!)
# How it works:
# 1. Redis calls fork() to create child process
# 2. Child writes entire dataset to temp file
# 3. Child atomically renames temp → dump.rdb
# 4. Parent continues serving requests
# 5. Copy-on-write: only modified pages duplicated
- Pro: Compact file, fast restarts, good for backups
- Pro: Child process does the I/O — zero impact on latency
- Con: Data loss between snapshots (up to minutes)
- Con:
fork()can be slow with large datasets (copies page tables)
AOF (Append-Only File)
Logs every write command. Replay the log to reconstruct state.
# redis.conf
appendonly yes
appendfsync everysec # fsync once per second (default)
# appendfsync always # fsync on every write (safest)
# appendfsync no # let OS decide (fastest)
# AOF file looks like:
*3\r\n$3\r\nSET\r\n$7\r\nuser:42\r\n$5\r\nAlice\r\n
# AOF rewrite (compaction):
BGREWRITEAOF
# Rewrites AOF with minimum commands to
# reconstruct current state. E.g., 1000 INCRs
# become a single SET with final value.
# Redis 7.0+: Multi-part AOF
# base.aof (RDB-format snapshot) +
# incr.aof (incremental commands since snapshot)
- Pro: Minimal data loss (at most 1 second with
everysec) - Pro: Append-only = no corruption risk from partial writes
- Con: Larger file size than RDB
- Con: Slower restarts (must replay all commands)
Recommended production config: Enable both RDB and AOF. Redis will use AOF for recovery (more complete data) and RDB for backups and faster cold starts. The Redis 7.0 multi-part AOF essentially combines the best of both — an RDB base with incremental AOF on top.
Persistence Decision Matrix
| Requirement | Use | Config |
|---|---|---|
| Pure cache, no durability needed | None | save "", appendonly no |
| Accept a few minutes of data loss | RDB only | save 300 10 |
| Accept at most ~1 second of loss | AOF everysec | appendfsync everysec |
| Zero data loss (rare for Redis) | AOF always | appendfsync always (~10x slower) |
| Best of both worlds (recommended) | RDB + AOF | Both enabled, AOF used for recovery |
Redis Cluster
A single Redis node can hold ~25 GB of data comfortably (limited by RAM). For larger datasets or higher throughput, you need Redis Cluster — Redis's built-in sharding solution that distributes data across multiple nodes.
How Hash Slots Work
Redis Cluster divides the entire keyspace into 16,384 hash slots (numbered 0–16,383). Each key is assigned to a slot using:
slot = CRC16(key) mod 16384
Each master node in the cluster owns a subset of these slots. With 3 masters:
- Node A: slots 0–5460
- Node B: slots 5461–10922
- Node C: slots 10923–16383
Each master has one or more replicas for failover. A typical production cluster runs 6 nodes: 3 masters + 3 replicas.
Hash Tags for Multi-Key Operations
Since multi-key commands (MGET, pipeline transactions) only work when all keys hash to the same slot, Redis supports hash tags: only the substring inside {...} is hashed.
# These all hash to the same slot (hash of "user:42")
SET {user:42}:profile "..."
SET {user:42}:settings "..."
SET {user:42}:cart "..."
# Now MGET works across all three
MGET {user:42}:profile {user:42}:settings {user:42}:cart
# Without hash tags, these could land on different nodes
# → CROSSSLOT error on multi-key commands
MOVED Redirects
When a client sends a command to the wrong node, the node responds with a MOVED redirect:
# Client sends to Node A:
GET user:500
# Node A calculates: CRC16("user:500") % 16384 = 7365
# 7365 is owned by Node B, not Node A
# Node A responds:
-MOVED 7365 192.168.1.2:6379
# Smart clients (like redis-py-cluster, Lettuce, Jedis)
# cache the slot→node mapping and send future requests
# for slot 7365 directly to Node B.
Resharding & Rebalancing
Adding a new node to the cluster:
# 1. Add Node D to the cluster
redis-cli --cluster add-node 192.168.1.4:6379 192.168.1.1:6379
# 2. Reshard: move slots from existing nodes to Node D
redis-cli --cluster reshard 192.168.1.1:6379
# → Interactively choose how many slots (e.g., 4096)
# → Choose source nodes (A, B, C) and target (D)
# During resharding:
# - Keys in migrating slots trigger ASK redirect
# - Client follows ASK to new node for that key
# - Once migration is complete, MOVED is returned instead
# - Entire process is online — no downtime
▶ Redis Cluster Hash Slot Routing
Watch how a key is routed to the correct node via CRC16 hashing, including a MOVED redirect.
Redis Sentinel
Redis Sentinel provides high availability for standalone Redis deployments (non-cluster mode). It handles automatic failover, monitoring, and service discovery.
How Sentinel works:
- Monitoring: Sentinel continuously pings the master and replicas. If a master doesn't respond within
down-after-milliseconds, the sentinel marks it as subjectively down (SDOWN). - Quorum: When enough sentinels (a configurable
quorum, typically 2 out of 3) agree the master is down, it's marked objectively down (ODOWN). - Failover: Sentinels elect a leader who promotes the best replica to master. "Best" = most up-to-date replication offset.
- Notification: Clients subscribed to Sentinel are notified of the new master's address. Smart clients (Lettuce, redis-py with Sentinel support) reconnect automatically.
# sentinel.conf
sentinel monitor mymaster 192.168.1.1 6379 2 # quorum=2
sentinel down-after-milliseconds mymaster 5000
sentinel failover-timeout mymaster 60000
sentinel parallel-syncs mymaster 1
# Query Sentinel for current master
redis-cli -p 26379 SENTINEL get-master-addr-by-name mymaster
# → "192.168.1.2" "6379" (after failover)
Pub/Sub & Streams
Pub/Sub (Fire-and-Forget)
Redis Pub/Sub is a simple messaging system where publishers send messages to channels and all subscribed clients receive them in real-time. Messages are not persisted — if no subscriber is listening, the message is lost.
# Publisher
PUBLISH chat:room:42 "Hello from Alice!"
# → (integer) 2 (2 subscribers received it)
# Subscriber (blocking)
SUBSCRIBE chat:room:42
# Blocks and receives messages as they arrive:
# 1) "message"
# 2) "chat:room:42"
# 3) "Hello from Alice!"
# Pattern subscribe (wildcard)
PSUBSCRIBE chat:room:*
# Receives messages from ALL chat rooms
Streams (Persistent, Consumer Groups)
For durable messaging with consumer groups, acknowledgment, and replay — use Redis Streams (added in Redis 5.0). Streams solve every limitation of Pub/Sub:
| Feature | Pub/Sub | Streams |
|---|---|---|
| Persistence | ❌ Fire and forget | ✅ Stored on disk |
| Consumer groups | ❌ All get every msg | ✅ Load-balanced across group |
| Acknowledgment | ❌ No ack | ✅ XACK per message |
| Replay | ❌ Missed = lost | ✅ Read from any ID |
| Backpressure | ❌ Slow client drops | ✅ Pending list tracks unprocessed |
Lua Scripting
Redis executes Lua scripts atomically — the entire script runs without any other command interleaving. This is the key to implementing complex atomic operations that can't be done with a single Redis command.
# Rate limiter using sliding window (Lua script)
# KEYS[1] = rate limit key, ARGV[1] = window (sec),
# ARGV[2] = max requests, ARGV[3] = current timestamp
local key = KEYS[1]
local window = tonumber(ARGV[1])
local max_req = tonumber(ARGV[2])
local now = tonumber(ARGV[3])
-- Remove entries outside the window
redis.call('ZREMRANGEBYSCORE', key, '-inf', now - window)
-- Count requests in current window
local count = redis.call('ZCARD', key)
if count < max_req then
-- Under limit: add this request
redis.call('ZADD', key, now, now .. '-' .. math.random(1000000))
redis.call('EXPIRE', key, window)
return 1 -- allowed
else
return 0 -- rate limited
end
# Execute the script from a client
redis-cli EVAL "$(cat rate_limit.lua)" 1 \
"rl:api:192.168.1.1" 60 100 1714000000
# → (integer) 1 (allowed)
# Or use EVALSHA with cached script hash
redis-cli SCRIPT LOAD "$(cat rate_limit.lua)"
# → "a1b2c3d4e5f6..."
redis-cli EVALSHA "a1b2c3d4e5f6..." 1 \
"rl:api:192.168.1.1" 60 100 1714000000
Other common Lua script use cases:
- Compare-and-swap: Read a value, check a condition, write atomically.
- Distributed lock release: Only delete the lock if you own it (check token, then delete).
- Inventory reservation: Decrement stock only if sufficient quantity exists.
- Leaderboard update: Update score and trim to top-N atomically.
# Safe distributed lock release (Redlock pattern)
# Only release if we still own the lock
if redis.call('GET', KEYS[1]) == ARGV[1] then
return redis.call('DEL', KEYS[1])
else
return 0 -- someone else holds it now
end
DynamoDB Deep Dive
Amazon DynamoDB is a fully managed, serverless key-value and document database that delivers single-digit millisecond performance at any scale. Unlike Redis (which you operate yourself), DynamoDB is a service — AWS handles replication, sharding, patching, and scaling.
Key characteristics:
- Serverless: No servers to provision, patch, or manage. Auto-scales up and down.
- Consistent performance: Single-digit ms latency whether your table has 1 KB or 1 PB of data.
- Multi-region: Global Tables replicate across AWS regions with sub-second replication lag.
- Flexible consistency: Choose per-query: strongly consistent reads or eventually consistent reads (2x cheaper).
- SLA: 99.999% availability for Global Tables, 99.99% for standard tables.
Partition Key & Sort Key
Every DynamoDB table has a primary key that uniquely identifies each item. There are two types:
Simple Primary Key (Partition Key Only)
The partition key alone must be unique. DynamoDB hashes it to determine which physical partition stores the item.
# Table: Users
# PK: user_id (String)
{
"user_id": "u-42", ← partition key
"name": "Alice",
"email": "alice@ex.com",
"plan": "premium"
}
# Access patterns:
# ✅ GetItem(user_id = "u-42") → O(1)
# ❌ Query(email = "alice@...") → full scan!
Composite Primary Key (PK + Sort Key)
Partition key + sort key together are unique. Enables range queries within a partition.
# Table: Orders
# PK: user_id (String), SK: order_id (String)
{
"user_id": "u-42", ← partition key
"order_id": "ord-001", ← sort key
"total": 15990,
"status": "shipped"
}
# Access patterns:
# ✅ GetItem(PK="u-42", SK="ord-001")
# ✅ Query(PK="u-42", SK begins_with "ord-")
# ✅ Query(PK="u-42", SK between "A" and "Z")
# ✅ Query(PK="u-42") → all orders for user
Partition key design is everything. A bad partition key causes hot partitions — all traffic concentrated on one physical partition while others sit idle. Good partition keys have high cardinality and uniform access distribution.
| Good PK | Bad PK | Why |
|---|---|---|
user_id | country | Millions of users vs. ~200 countries → hot partition on "US" |
device_id | date | All IoT writes go to today's partition |
order_id (UUID) | status | Only 5 statuses → extreme imbalance |
Global & Local Secondary Indexes
DynamoDB's primary key determines the only efficient access pattern on the base table. For additional access patterns, you need secondary indexes.
Local Secondary Index (LSI)
Same partition key as the base table, different sort key. Must be created at table creation time. Limit: 5 per table.
# Base table: Orders
# PK: user_id, SK: order_id
# LSI: OrdersByDate
# PK: user_id (same), SK: created_at
# Now you can query:
# "All orders for user u-42, sorted by date"
response = table.query(
IndexName='OrdersByDate',
KeyConditionExpression=
Key('user_id').eq('u-42') &
Key('created_at').between(
'2026-01-01', '2026-04-30'
)
)
- Strongly consistent reads ✅
- Shares throughput with base table
- 10 GB limit per partition key value
Global Secondary Index (GSI)
Completely different partition key and/or sort key. Can be created anytime. Limit: 20 per table.
# Base table: Orders
# PK: user_id, SK: order_id
# GSI: OrdersByStatus
# PK: status, SK: created_at
# Now you can query:
# "All pending orders, newest first"
response = table.query(
IndexName='OrdersByStatus',
KeyConditionExpression=
Key('status').eq('pending'),
ScanIndexForward=False # descending
)
# GSI: OrdersByEmail
# PK: email, SK: user_id
# "Find user by email" → efficient lookup
- Eventually consistent reads only ❌
- Has its own provisioned throughput
- Asynchronously replicated from base
Single-Table Design Patterns
Single-table design is a DynamoDB modeling technique where all entity types live in one table, using generic key names (PK, SK) with prefixed values to differentiate entities. This is the recommended approach from AWS for most applications.
Why? DynamoDB charges per table (provisioned capacity) and per-request (on-demand). Fewer tables = fewer things to manage, and you can fetch related entities in a single Query operation.
# Single-table design for an e-commerce app
# Table: ECommerce
# PK (String) | SK (String) | Attributes...
# User entity
PK=USER#u-42 SK=PROFILE name=Alice email=alice@ex.com plan=premium
# User's orders (all in one partition → one Query)
PK=USER#u-42 SK=ORDER#2026-04-15#ord-001 total=15990 status=shipped
PK=USER#u-42 SK=ORDER#2026-04-10#ord-002 total=8500 status=delivered
# Order items (nested under order)
PK=ORDER#ord-001 SK=ITEM#sku-101 qty=1 price=9999
PK=ORDER#ord-001 SK=ITEM#sku-205 qty=2 price=2995
# Product entity
PK=PRODUCT#sku-101 SK=METADATA name="Widget" category="electronics"
PK=PRODUCT#sku-101 SK=REVIEW#u-43 rating=5 text="Great product!"
# GSI1: Inverted index (SK → PK)
# GSI1PK=ORDER#2026-04-15#ord-001 GSI1SK=USER#u-42
# → "Which user placed order ord-001?"
# GSI2: By status
# GSI2PK=shipped GSI2SK=2026-04-15#ord-001
# → "All shipped orders, sorted by date"
Query Examples (Python / boto3)
import boto3
from boto3.dynamodb.conditions import Key
dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('ECommerce')
# 1. Get user profile
profile = table.get_item(
Key={'PK': 'USER#u-42', 'SK': 'PROFILE'}
)['Item']
# 2. Get all orders for a user (sorted by date)
orders = table.query(
KeyConditionExpression=
Key('PK').eq('USER#u-42') &
Key('SK').begins_with('ORDER#')
)['Items']
# 3. Get orders from last 30 days
recent = table.query(
KeyConditionExpression=
Key('PK').eq('USER#u-42') &
Key('SK').between('ORDER#2026-03-16', 'ORDER#2026-04-16')
)['Items']
# 4. Get all items in an order
items = table.query(
KeyConditionExpression=
Key('PK').eq('ORDER#ord-001') &
Key('SK').begins_with('ITEM#')
)['Items']
# 5. Batch get multiple entities (up to 100 items)
result = dynamodb.batch_get_item(
RequestItems={
'ECommerce': {
'Keys': [
{'PK': 'USER#u-42', 'SK': 'PROFILE'},
{'PK': 'USER#u-43', 'SK': 'PROFILE'},
{'PK': 'PRODUCT#sku-101', 'SK': 'METADATA'}
]
}
}
)
# 6. Transactional write (atomic across items)
dynamodb.meta.client.transact_write_items(
TransactItems=[
{'Put': {'TableName': 'ECommerce', 'Item': {
'PK': 'USER#u-42',
'SK': 'ORDER#2026-04-16#ord-003',
'total': 5000, 'status': 'created'
}}},
{'Update': {'TableName': 'ECommerce',
'Key': {'PK': 'PRODUCT#sku-101', 'SK': 'METADATA'},
'UpdateExpression': 'SET stock = stock - :qty',
'ConditionExpression': 'stock >= :qty',
'ExpressionAttributeValues': {':qty': 1}
}}
]
)
Capacity Planning
DynamoDB has two capacity modes. Understanding the math is critical for cost estimation in system design interviews.
Read Capacity Units (RCU)
- 1 RCU = one strongly consistent read of up to 4 KB
- 1 RCU = two eventually consistent reads of up to 4 KB
- Transactional reads cost 2 RCUs per 4 KB read
Write Capacity Units (WCU)
- 1 WCU = one write of up to 1 KB
- Transactional writes cost 2 WCUs per 1 KB write
Capacity Planning Example
Scenario: E-commerce platform with 10,000 DAU, peak 500 requests/sec.
# Read calculations:
# - Average item size: 2 KB
# - 80% reads are eventually consistent
# - 20% reads are strongly consistent
# - Peak: 400 reads/sec
Eventually consistent: 400 × 0.8 = 320 reads/sec
Each 2 KB item = ceil(2/4) = 1 RCU per strongly consistent read
Eventually consistent = 0.5 RCU per read
→ 320 × 0.5 = 160 RCUs
Strongly consistent: 400 × 0.2 = 80 reads/sec
→ 80 × 1 = 80 RCUs
Total read capacity: 160 + 80 = 240 RCUs
# Write calculations:
# - Average item size: 1.5 KB → ceil(1.5/1) = 2 WCUs per write
# - Peak: 100 writes/sec
Total write capacity: 100 × 2 = 200 WCUs
# Cost estimate (us-east-1, provisioned):
# RCU: $0.00013 per RCU per hour
# WCU: $0.00065 per WCU per hour
#
# Monthly (730 hours):
# Reads: 240 × $0.00013 × 730 = $22.78/month
# Writes: 200 × $0.00065 × 730 = $94.90/month
# Total: ~$118/month (provisioned)
#
# On-demand pricing:
# $1.25 per million write request units
# $0.25 per million read request units
# At 400 reads/sec × 86400 sec = 34.56M reads/day
# At 100 writes/sec × 86400 sec = 8.64M writes/day
# → ~$8.64 + ~$10.80 = ~$19.44/day = ~$583/month
# → Provisioned is cheaper at sustained load!
Adaptive Capacity
DynamoDB automatically handles uneven workloads with adaptive capacity. Even if one partition is receiving disproportionate traffic (hot partition), DynamoDB will:
- Burst capacity: Each partition can burst up to 300 seconds of unused throughput.
- Adaptive capacity: Automatically increases throughput for hot partitions by "borrowing" unused capacity from other partitions, as long as total table throughput isn't exceeded.
- Partition splitting: If a partition is consistently hot, DynamoDB will automatically split it into two partitions. This is transparent and requires no action.
DynamoDB Streams & DAX
DynamoDB Streams
DynamoDB Streams captures a time-ordered sequence of item-level modifications (inserts, updates, deletes) in a DynamoDB table. Each stream record contains the primary key, the old/new image (or both), and a timestamp.
# Enable streams on a table
aws dynamodb update-table \
--table-name ECommerce \
--stream-specification \
StreamEnabled=true,StreamViewType=NEW_AND_OLD_IMAGES
# Stream view types:
# KEYS_ONLY → only the key attributes
# NEW_IMAGE → the entire item after modification
# OLD_IMAGE → the entire item before modification
# NEW_AND_OLD_IMAGES → both (most useful)
# Common use cases:
# 1. Trigger Lambda on item change (event-driven)
# 2. Cross-region replication (Global Tables use this)
# 3. Materialized views / denormalization
# 4. Analytics pipeline: Stream → Kinesis → S3 → Athena
# 5. Audit log / change data capture (CDC)
Example: Auto-update a search index when a product changes:
# Lambda function triggered by DynamoDB Stream
def handler(event, context):
for record in event['Records']:
if record['eventName'] in ('INSERT', 'MODIFY'):
new_image = record['dynamodb']['NewImage']
pk = new_image['PK']['S']
if pk.startswith('PRODUCT#'):
# Index in Elasticsearch
es_client.index(
index='products',
id=pk,
body={
'name': new_image['name']['S'],
'category': new_image['category']['S'],
'price': int(new_image['price']['N'])
}
)
elif record['eventName'] == 'REMOVE':
old_image = record['dynamodb']['OldImage']
pk = old_image['PK']['S']
if pk.startswith('PRODUCT#'):
es_client.delete(index='products', id=pk)
DAX (DynamoDB Accelerator)
DAX is a fully managed, in-memory caching layer for DynamoDB that delivers microsecond response times (vs. single-digit millisecond for vanilla DynamoDB). It's API-compatible — just change the endpoint.
# Without DAX (standard DynamoDB)
import boto3
dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('ECommerce')
result = table.get_item(Key={'PK': 'USER#u-42', 'SK': 'PROFILE'})
# Latency: ~5 ms
# With DAX (just change the client!)
import amazondax
dax_client = amazondax.AmazonDaxClient(
endpoint_url='dax://my-cluster.abc123.dax-clusters.us-east-1.amazonaws.com:8111'
)
table = dax_client.Table('ECommerce')
result = table.get_item(Key={'PK': 'USER#u-42', 'SK': 'PROFILE'})
# Latency: ~0.2 ms (25x faster)
# DAX caching behavior:
# - Item cache: caches GetItem/BatchGetItem results
# - Query cache: caches Query/Scan results
# - Write-through: writes go to both DAX and DynamoDB
# - TTL: default 5 minutes (configurable)
# - Cache invalidation: automatic on writes
When to use DAX vs. Redis as a DynamoDB cache:
| Criteria | DAX | Redis |
|---|---|---|
| Integration | Drop-in (same API) | Requires cache-aside code |
| Data structures | None (just cached items) | Rich (sorted sets, lists, etc.) |
| Cache invalidation | Automatic write-through | Manual (you manage TTL/eviction) |
| Cost | ~$0.27/hr per node | Varies (ElastiCache pricing) |
| Use case | Read-heavy DynamoDB only | Multi-source caching, pub/sub, etc. |
Redis vs DynamoDB vs Memcached
These three are often confused or conflated in system design discussions. They serve very different purposes:
| Dimension | Redis | DynamoDB | Memcached |
|---|---|---|---|
| Type | In-memory data structure store | Managed NoSQL (SSD-backed) | In-memory cache (strings only) |
| Data model | Strings, Lists, Sets, ZSets, Hashes, Streams, HLL, Bitmaps | Items (JSON-like) with typed attributes | Binary strings (blobs) only |
| Max value size | 512 MB | 400 KB per item | 1 MB (default) |
| Latency | Sub-ms (in-memory) | 1–5 ms (SSD); μs with DAX | Sub-ms (in-memory) |
| Persistence | Optional (RDB + AOF) | Always durable (3-AZ replication) | None (pure cache) |
| Consistency | Strong (single node) / eventual (replicas) | Strong or eventually consistent (per query) | N/A (no replication) |
| Scaling | Redis Cluster (manual sharding) | Automatic (serverless) | Client-side sharding |
| Secondary indexes | Manual (sorted sets as indexes) | GSI + LSI (up to 25 total) | None |
| Transactions | MULTI/EXEC (optimistic locking) | TransactWriteItems / TransactGetItems (ACID, up to 100 items) | CAS (check and set) only |
| Scripting | Lua scripts (atomic) | ConditionExpressions, UpdateExpressions | None |
| Pub/Sub | Yes (Pub/Sub + Streams) | DynamoDB Streams (CDC) | No |
| Multi-threading | Single-threaded cmd processing (I/O threads since 6.0) | Managed (transparent) | Multi-threaded |
| Ops burden | Medium (you manage cluster) | Low (fully managed) | Medium (you manage servers) |
| Pricing | Per node (RAM-based) | Per RCU/WCU or per request | Per node (RAM-based) |
When to Use Each
Choose Redis When…
- Sub-millisecond latency is required (caching, session store, rate limiting)
- You need rich data structures — leaderboards (sorted sets), queues (lists), unique counts (HyperLogLog)
- You want atomic operations on complex data (Lua scripts, MULTI/EXEC)
- Pub/Sub or Streams for real-time messaging within a service
- Data fits in memory (RAM is the limit)
- You can tolerate some data loss (even with AOF everysec, 1 second window)
- Use as a complement to a primary database, not as the primary database itself (unless ephemeral data)
Choose DynamoDB When…
- You need a durable, scalable primary database with consistent single-digit ms latency
- Access patterns are known and limited (design your keys/indexes upfront)
- You want zero operational overhead — no servers, patches, backups to manage
- Auto-scaling for unpredictable traffic (on-demand mode)
- Multi-region with Global Tables (active-active replication)
- You're already in AWS ecosystem (Lambda + DynamoDB Streams + API Gateway)
- Data exceeds what fits in memory but you still need fast key-based access
Choose Memcached When…
- You need a simple, multi-threaded cache for small string/blob values
- Workload is read-heavy and can benefit from multi-core utilization
- You don't need persistence, data structures, or pub/sub
- You want the simplest possible caching layer with minimal features
- Large working set that benefits from Memcached's more efficient memory usage per key (no overhead for data structure metadata)
Common Architecture Patterns
- Cache-aside + DynamoDB: Redis/Memcached fronting DynamoDB for hot reads
- Redis as primary + PostgreSQL as source of truth: Fast reads from Redis, periodic sync to SQL for reporting
- DynamoDB + Streams + Lambda: Event-driven architecture, DynamoDB as event store
- Redis Streams as message broker: Between microservices when Kafka is overkill
- DynamoDB + DAX + CloudFront: Full AWS caching stack for API responses