Consistent Hashing

High Level Design Series · Foundations · Part 8 | April 2026 · 20 min read

1 · The Problem with Simple Hashing

The most natural way to distribute keys across N servers is modular hashing:

server = hash(key) % N

For a fixed N this works perfectly: keys are spread roughly uniformly (assuming a decent hash function), and lookups are O(1). The problem explodes the moment N changes.

What Happens When N Changes?

Suppose we have 3 servers and 12 keys distributed as:

key  hash(key)  hash%3  hash%4  (remapped?)
──────────────────────────────────────────
k0       0        0       0      same
k1      17        2       1      ✗ moved
k2      42        0       2      ✗ moved
k3       3        0       3      ✗ moved
k4      25        1       1      same
k5      11        2       3      ✗ moved
k6      36        0       0      same
k7      19        1       3      ✗ moved
k8      44        2       0      ✗ moved
k9       9        0       1      ✗ moved
k10     31        1       3      ✗ moved
k11     22        1       2      ✗ moved

Adding just one server (3→4) remapped 9 out of 12 keys (75%). This is not a fluke. Mathematically, when we go from N to N+1 servers:

Expected fraction of keys that move = 1 − 1/(N+1) ≈ N/(N+1) For N = 3: 1 − 1/4 = 75% of keys remap For N = 10: 1 − 1/11 ≈ 91% of keys remap For N = 100: 1 − 1/101 ≈ 99% of keys remap

⚠ Catastrophic for Caches: If you have a Memcached cluster with 100 servers caching millions of objects, adding server 101 invalidates ~99% of the cache. Every request becomes a cache miss, hammering the backend database. This is called a cache stampede or thundering herd.

The same problem occurs when removing a server (e.g., server crash). With modular hashing, nearly all keys must be reassigned. We need an approach where adding or removing a server only displaces K/N keys (where K is the total number of keys), not nearly all of them.

2 · Consistent Hashing Concept

Consistent hashing was introduced by Karger et al. in their 1997 paper "Consistent Hashing and Random Trees: Distributed Caching Protocols for Relieving Hot Spots on the World Wide Web" at MIT, originally to solve the problem of distributed web caching.

The core idea is elegant: instead of mapping keys to servers using modular arithmetic, we map both keys and servers onto a circular ring (hash space), and each key is served by the first server encountered when walking clockwise from the key's position.

The Hash Ring

Imagine a circle representing the entire hash space, from 0 to 2³²−1 (or 2¹²⁸−1, or 2²⁵⁶−1, depending on the hash function). The circle wraps around: position 2³²−1 is adjacent to position 0.

Place servers on the ring: Hash each server identifier (e.g., IP address, hostname) to get a position on the ring. pos(S_i) = hash(server_id_i)
Place keys on the ring: Hash each key to get its position. pos(K_j) = hash(key_j)
Assignment rule: Each key is assigned to the first server encountered clockwise from the key's position on the ring.

Why This Works

When a new server S_new is added to the ring, it only takes over keys that were previously assigned to its immediate clockwise successor. All other key→server mappings remain unchanged. Similarly, when a server is removed, only its keys are reassigned—to its clockwise successor.

Keys moved when adding 1 server to N servers: Expected = K / (N + 1) Keys moved when removing 1 server from N servers: Expected = K / N Compare with modular hashing: Expected \approx K \times N / (N + 1) \leftarrow almost ALL keys!

💡 Key Insight: With consistent hashing, adding a server to a 100-node cluster moves only ~1% of keys. With modular hashing, it would move ~99% of keys. This is the fundamental reason distributed systems use consistent hashing.

3 · Hash Ring in Detail

▶ Interactive Hash Ring

Step through adding and removing servers on a consistent hash ring. Watch how only neighboring keys are affected.

S1 S2 S3 S4 S5 (added) Keys

Implementation: Sorted Array + Binary Search

The ring is implemented as a sorted array of server hashes. To find which server owns a key:

Compute h = hash(key)
Binary search the sorted array for the smallest server hash ≥ h
If no server hash is ≥ h, wrap around to the first server in the array (index 0)

// Finding the responsible server: O(log N) with binary search
int findServer(uint32_t keyHash, vector<uint32_t>& ring) {
    auto it = lower_bound(ring.begin(), ring.end(), keyHash);
    if (it == ring.end())
        return 0;                     // wrap around
    return distance(ring.begin(), it);
}

Hash Function Choice

Hash Function	Output Size	Speed	Distribution	Notes
MD5	128-bit	Moderate	Excellent	Widely used (Ketama). Cryptographic but slower.
SHA-1	160-bit	Moderate	Excellent	Used in Cassandra's partitioner.
xxHash	32/64/128-bit	Very fast	Excellent	Non-cryptographic. Great for hash rings.
MurmurHash3	32/128-bit	Very fast	Excellent	Used in many distributed systems.
CRC32	32-bit	Very fast	Good	Small output. Faster but less uniform.

💡 Practical Note: For consistent hashing, you do not need a cryptographic hash. xxHash or MurmurHash3 are typically the best choices: extremely fast with excellent distribution. The original Ketama library used MD5, which is overkill for this purpose.

Ring Size Considerations

A 32-bit hash gives 2³² (≈4.3 billion) positions on the ring, which is more than sufficient for most systems. With virtual nodes (covered next), even a 100-server cluster with 200 virtual nodes each yields only 20,000 positions to store—trivially small.

4 · Virtual Nodes

With only a few physical servers, the hash ring can become severely unbalanced. If servers are spaced unevenly on the ring, one server might own 60% of the key space while another owns only 10%.

The Problem: Uneven Distribution

With N servers placed on the ring, each server's arc length follows an exponential distribution. The expected standard deviation of load per server is:

σ = 1/N \times \sqrt(N-1)/N \approx 1/\sqrtN For N=3: σ \approx 57.7% of average load For N=10: σ \approx 31.6% of average load This means with 3 servers, the most loaded server will handle roughly 33% \pm 57% of total keys — catastrophically uneven.

The Solution: Virtual Nodes (Vnodes)

Instead of placing each physical server at one position, we place it at V positions (virtual nodes). Each physical server S_i is mapped to virtual nodes hash("S_i#0"), hash("S_i#1"), ..., hash("S_i#V-1").

// Generate virtual node positions for a server
vector<uint32_t> virtualNodes(const string& serverId, int V) {
    vector<uint32_t> positions;
    for (int i = 0; i < V; i++) {
        string vnode = serverId + "#" + to_string(i);
        positions.push_back(hash(vnode));
    }
    return positions;
}

How Many Virtual Nodes?

Virtual Nodes (V)	Std Dev of Load	Max/Avg Load Ratio	Memory per Server
1 (no vnodes)	~100%	~3×	4 bytes
10	~31%	~1.6×	40 bytes
50	~14%	~1.3×	200 bytes
100	~10%	~1.2×	400 bytes
150	~8%	~1.15×	600 bytes
200	~7%	~1.12×	800 bytes

💡 Industry Standard: Most production systems use 100–200 virtual nodes per physical server. Cassandra defaults to 256 tokens (virtual nodes) per node. This provides near-uniform distribution while keeping the ring lookup table small. With 1000 servers × 200 vnodes = 200,000 entries, the ring still fits in under 1 MB.

Benefits Beyond Load Balancing

Heterogeneous servers: Give more powerful servers more virtual nodes (proportional to capacity). A server with 2× RAM gets 2× vnodes.
Gradual scaling: When adding a new server, its virtual nodes are scattered across the ring, stealing small portions from many existing servers rather than a large chunk from one.
Graceful removal: When a server leaves, its load is spread across many successors rather than dumped onto one.

Trade-offs

Memory: Ring lookup table grows by V×. For V=200 and N=1000, that is 200K entries (typically <2 MB)—negligible.
Lookup time: Binary search on a larger ring: O(log(N×V)) instead of O(log N). For 200K entries, that is ~17 comparisons vs ~10. Also negligible.
Rebalancing complexity: Adding a server requires computing V hash values and inserting V entries into the sorted ring. Still very fast.

5 · Adding and Removing Nodes

Adding a Node

When server S_new joins the ring:

Compute all V virtual node positions for S_new
For each virtual node position p:
- Find the current owner of position p (the next clockwise server)
- Transfer keys in the range (predecessor(p), p] from the current owner to S_new
Insert all virtual node positions into the ring

Keys transferred = K / (N + 1) With V virtual nodes each, we transfer V small ranges from up to V different successor servers, totaling K/(N+1) keys.

Removing a Node

When server S_old leaves the ring (planned or crash):

For each of S_old's V virtual node positions:
- Find the successor server (next clockwise)
- Transfer keys owned by this virtual node to the successor
Remove all V virtual node positions from the ring

Graceful Migration Strategy

In practice, data migration happens in the background while the system continues to serve requests. Common strategies include:

Double-read: During migration, read from both old and new owner. Return whichever has data.
Forwarding: Old owner forwards requests for migrated keys to the new owner.
Hinted handoff: If the new owner is unavailable, a neighbor temporarily accepts writes and delivers them later (used by DynamoDB and Cassandra).

Modular vs. Consistent: Side-by-Side

▶ Simple Hash vs. Consistent Hash

See the dramatic difference when adding a 4th server. Left: modular hash remaps nearly everything. Right: consistent hash remaps only one key.

6 · Bounded Load Consistent Hashing

Standard consistent hashing provides good average distribution but does not give hard guarantees on maximum load. In 2016, Google published "Consistent Hashing with Bounded Loads" (Mirrokni, Thorup, Zadimoghaddam) to address this.

The Idea

Each server has a capacity cap of (1 + ε) × (K/N) where ε is a small constant (e.g., 0.25). When a key hashes to a server that is already at capacity, the key spills to the next server clockwise that has available capacity.

capacity(server) = ceil((1 + ε) \times total_keys / num_servers) For ε = 0.25 with 1000 keys and 10 servers: Average load = 100 keys/server Max allowed = ceil(1.25 \times 100) = 125 keys/server

Algorithm

Server findBoundedServer(Key key, Ring ring, double epsilon) {
    int avgLoad = totalKeys / ring.numServers();
    int cap = ceil((1 + epsilon) * avgLoad);

    uint32_t h = hash(key);
    Server s = ring.nextClockwise(h);

    while (s.currentLoad() >= cap) {
        s = ring.nextClockwise(s.position());
    }
    return s;
}

Properties

Hard upper bound: No server ever exceeds (1+ε) × average load.
Minimal disruption: When ε is large, behavior approaches standard consistent hashing. When ε=0, it becomes round-robin.
Practical ε values: 0.1–0.5 in production. Lower ε gives tighter bounds but more spillover.
Dynamic: As load changes, key assignments can shift. This means lookups must check capacity in real time.

⚠ Trade-off: Bounded-load consistent hashing sacrifices the nice property that a key always maps to the same server. Under load pressure, keys can spill to different servers. This complicates caching since a key might be on different servers at different times. Google's Vimeo and HAProxy load balancers use this approach.

7 · Jump Consistent Hashing

In 2014, Google engineers Lamping and Veach published "A Fast, Minimal Memory, Consistent Hash Algorithm" describing jump consistent hashing—an algorithm that is:

O(1) memory (no ring to store)
O(ln n) time per lookup
Perfectly uniform distribution
Minimal disruption: exactly K/N keys move when going from N to N±1 buckets

The Algorithm

The entire algorithm fits in about 10 lines:

int32_t jumpConsistentHash(uint64_t key, int32_t numBuckets) {
    int64_t b = -1, j = 0;
    while (j < numBuckets) {
        b = j;
        key = key * 2862933555777941757ULL + 1;
        j = (int64_t)((b + 1) * (double(1LL << 31) /
            double((key >> 33) + 1)));
    }
    return (int32_t)b;
}

How It Works (Intuition)

Think of it as a sequence of “jumps” through bucket assignments. For each bucket count from 1 to N, the algorithm decides whether the key should “jump” to the new bucket or stay put. The probability of jumping from bucket b to a new bucket is 1/(b+1), which ensures that exactly 1/(N+1) of keys move when bucket N+1 is added.

Jump Hash vs. Ring Hash

Property	Ring Hash	Jump Hash
Memory	O(N × V)	O(1)
Lookup Time	O(log(N × V))	O(ln N)
Distribution	Near-uniform (with vnodes)	Perfectly uniform
Arbitrary node removal	✓ Yes	✗ No (end only)
Named nodes	✓ Yes (IP, hostname)	✗ No (0..n-1)
Dynamic topology	✓ Flexible	✗ Linear only

⚠ Critical Limitation: Jump hash can only handle numbered buckets 0 to n−1. You can add bucket n or remove bucket n−1, but you cannot remove an arbitrary bucket in the middle. This makes it unsuitable for server failure scenarios. It works well for statically sized partitions (like sharded databases where shard count is fixed).

8 · Rendezvous Hashing (HRW)

Highest Random Weight (HRW) hashing was independently developed by Thaler and Ravishankar in 1998. Unlike consistent hashing which uses a ring, rendezvous hashing takes a direct approach.

Algorithm

For each key, compute a score for every server and pick the server with the highest score:

Server findServer(Key key, vector<Server>& servers) {
    Server best;
    uint64_t bestScore = 0;

    for (auto& s : servers) {
        uint64_t score = hash(key + s.id());
        if (score > bestScore) {
            bestScore = score;
            best = s;
        }
    }
    return best;
}

Properties

O(N) per lookup: Must evaluate all servers. This is fine for small N (5–50 servers) but becomes expensive for large N.
No data structure needed: No ring, no sorted array. Just iterate over the server list.
Minimal disruption: Removing one server moves only K/N keys (the keys that chose the removed server). Adding one server moves only K/(N+1) keys.
Simple implementation: Trivially correct. No virtual nodes needed for good distribution.
Naturally uniform: Each key is equally likely to choose any server (assuming a good hash function).

When to Use Rendezvous vs. Ring

Use Rendezvous When:

Small number of servers (<50)
Simplicity matters more than lookup speed
Need perfect distribution without tuning vnodes
Need k-of-n server selection (e.g., top-3 servers for replication)

Use Ring Hash When:

Large number of servers (100+)
Need O(log N) lookups
Topology changes frequently
Already have vnodes infrastructure

💡 Replication Bonus: Rendezvous hashing naturally supports replication. To replicate a key to k servers, pick the top-k scoring servers. If one server fails, the key is already on k−1 others, and the replacement is deterministic (the (k+1)-th highest scoring server).

9 · Real-World Usage

Amazon DynamoDB

DynamoDB (and its predecessor Dynamo, described in the seminal 2007 paper) pioneered consistent hashing in production. Each node is assigned multiple virtual nodes on an MD5-based hash ring. Key features:

Data is replicated to N successor nodes on the ring (default N=3)
Uses hinted handoff for temporary failures
Uses vector clocks for conflict resolution
Virtual nodes allow heterogeneous hardware (more tokens for bigger machines)

Apache Cassandra

Cassandra uses consistent hashing with 256 virtual nodes (tokens) per physical node by default:

Murmur3Partitioner: Default partitioner using MurmurHash3 on a −2⁶³ to 2⁶³−1 ring
RandomPartitioner: Alternative using MD5 on a 0 to 2¹²⁷−1 ring
Token ranges are used to determine which node owns which data
Adding a node automatically streams data from neighboring token ranges

Akamai CDN

Akamai (co-founded by Karger, one of the consistent hashing inventors) uses consistent hashing to assign web content to edge servers. When a user requests content:

The URL is hashed to find the responsible edge server
Server failures are handled transparently: requests route to the next server on the ring
New servers are added incrementally, gradually absorbing load without cache stampedes

Discord

Discord uses consistent hashing to route messages to the correct guild (server) shard:

Guild ID is hashed to determine the owning database shard
Consistent hashing ensures that resharding (adding shards) only moves a fraction of guilds
Combined with replication for high availability

Memcached (Ketama)

The Ketama library is the de facto standard for consistent hashing in Memcached clients:

Uses MD5 to generate 4 virtual nodes per hash (using the 4 × 32-bit segments of the 128-bit MD5 output)
Default: 160 virtual nodes per server (40 hashes × 4 points each)
Originally developed at Last.fm, now used in libmemcached and most Memcached clients

System	Hash Function	Virtual Nodes	Ring Size	Replication
DynamoDB	MD5	Variable	2¹²⁸	N=3 successors
Cassandra	Murmur3	256 default	2⁶⁴	Configurable RF
Akamai CDN	Proprietary	Yes	2³²	Multi-tier
Memcached (Ketama)	MD5	160	2³²	None (cache)
Redis Cluster	CRC16	16384 slots	16384	Replica shards

10 · Implementation

Here is a complete C++ implementation of consistent hashing with virtual nodes:

#include <map>
#include <string>
#include <vector>
#include <functional>
#include <stdexcept>
#include <sstream>
#include <cstdint>

class ConsistentHash {
private:
    int numVnodes;
    std::map<uint32_t, std::string> ring;       // hash → server id
    std::hash<std::string> hasher;

    uint32_t hashKey(const std::string& key) const {
        // In production, use MurmurHash3 or xxHash.
        // std::hash is used here for simplicity.
        return static_cast<uint32_t>(hasher(key));
    }

public:
    explicit ConsistentHash(int vnodes = 150)
        : numVnodes(vnodes) {}

    // Add a physical server with V virtual nodes
    void addServer(const std::string& serverId) {
        for (int i = 0; i < numVnodes; i++) {
            std::string vnode = serverId + "#" + std::to_string(i);
            uint32_t h = hashKey(vnode);
            ring[h] = serverId;
        }
    }

    // Remove a physical server and all its virtual nodes
    void removeServer(const std::string& serverId) {
        for (int i = 0; i < numVnodes; i++) {
            std::string vnode = serverId + "#" + std::to_string(i);
            uint32_t h = hashKey(vnode);
            ring.erase(h);
        }
    }

    // Find the server responsible for a key
    std::string getServer(const std::string& key) const {
        if (ring.empty())
            throw std::runtime_error("Empty ring");

        uint32_t h = hashKey(key);
        // Find the first server position >= h (clockwise search)
        auto it = ring.lower_bound(h);
        if (it == ring.end())
            it = ring.begin();   // wrap around to 0
        return it->second;
    }

    // Get N distinct servers for replication
    std::vector<std::string> getServers(
            const std::string& key, int count) const {
        if (ring.empty())
            throw std::runtime_error("Empty ring");

        std::vector<std::string> result;
        uint32_t h = hashKey(key);
        auto it = ring.lower_bound(h);

        while (result.size() < static_cast<size_t>(count)) {
            if (it == ring.end())
                it = ring.begin();

            // Skip duplicate physical servers
            bool found = false;
            for (auto& s : result) {
                if (s == it->second) { found = true; break; }
            }
            if (!found)
                result.push_back(it->second);

            ++it;
            if (it == ring.begin()) break; // wrapped fully
        }
        return result;
    }

    size_t ringSize() const { return ring.size(); }
};

Python Implementation

import hashlib
import bisect

class ConsistentHash:
    def __init__(self, vnodes=150):
        self.vnodes = vnodes
        self.ring = {}          # hash → server_id
        self.sorted_keys = []   # sorted list of hashes

    def _hash(self, key: str) -> int:
        """Generate a 32-bit hash for a given key."""
        return int(hashlib.md5(
            key.encode()).hexdigest(), 16) % (2**32)

    def add_server(self, server_id: str):
        """Add a server with V virtual nodes to the ring."""
        for i in range(self.vnodes):
            vnode_key = f"{server_id}#{i}"
            h = self._hash(vnode_key)
            self.ring[h] = server_id
            bisect.insort(self.sorted_keys, h)

    def remove_server(self, server_id: str):
        """Remove a server and all its virtual nodes."""
        for i in range(self.vnodes):
            vnode_key = f"{server_id}#{i}"
            h = self._hash(vnode_key)
            if h in self.ring:
                del self.ring[h]
                self.sorted_keys.remove(h)

    def get_server(self, key: str) -> str:
        """Find the responsible server for a key."""
        if not self.ring:
            raise ValueError("Empty ring")
        h = self._hash(key)
        # Binary search for next clockwise server
        idx = bisect.bisect_left(self.sorted_keys, h)
        if idx == len(self.sorted_keys):
            idx = 0   # wrap around
        return self.ring[self.sorted_keys[idx]]

    def get_servers(self, key: str, count: int) -> list:
        """Get 'count' distinct servers for replication."""
        if not self.ring:
            raise ValueError("Empty ring")
        result = []
        h = self._hash(key)
        idx = bisect.bisect_left(self.sorted_keys, h)
        seen = set()
        while len(result) < count:
            if idx == len(self.sorted_keys):
                idx = 0
            server = self.ring[self.sorted_keys[idx]]
            if server not in seen:
                result.append(server)
                seen.add(server)
            idx += 1
            if idx == bisect.bisect_left(
                    self.sorted_keys, h):
                break   # full circle
        return result


# ---- Usage example ----
ch = ConsistentHash(vnodes=150)
for s in ["server-A", "server-B", "server-C"]:
    ch.add_server(s)

print(ch.get_server("user:1001"))   # → one of A/B/C
print(ch.get_server("session:xyz")) # → one of A/B/C

# Add server D — only ~25% of keys move
ch.add_server("server-D")

# Replication: get top-3 servers for a key
print(ch.get_servers("user:1001", 3))  # → 3 distinct servers

Verifying Minimal Disruption

# Measure how many keys move when adding a server
import random

ch = ConsistentHash(vnodes=150)
servers = [f"server-{i}" for i in range(10)]
for s in servers:
    ch.add_server(s)

# Map 10,000 random keys
keys = [f"key-{i}" for i in range(10000)]
before = {k: ch.get_server(k) for k in keys}

# Add server-10
ch.add_server("server-10")
after = {k: ch.get_server(k) for k in keys}

moved = sum(1 for k in keys if before[k] != after[k])
print(f"Keys moved: {moved}/{len(keys)}")
print(f"Percentage: {moved/len(keys)*100:.1f}%")
print(f"Expected:   {100/11:.1f}%")
# Output: ~9.1% (close to the theoretical 1/11 = 9.09%)

Summary & Cheat Sheet

Algorithm	Memory	Lookup	Distribution	Arbitrary Remove	Best For
Modular Hash (% N)	O(1)	O(1)	Uniform	✗ (remaps all)	Static N only
Ring Hash + Vnodes	O(N×V)	O(log(N×V))	Near-uniform	✓	General purpose
Bounded Load Ring	O(N×V)	O(log(N×V))	Bounded	✓	Hot-spot prevention
Jump Hash	O(1)	O(ln N)	Perfect	✗ (end only)	Static partitions
Rendezvous (HRW)	O(N)	O(N)	Perfect	✓	Small N, replication

💡 Interview Tip: In system design interviews, when asked about consistent hashing, mention the ring, virtual nodes, and at least one real-world system (DynamoDB or Cassandra). If you can also mention bounded-load hashing or rendezvous hashing as alternatives, you'll stand out. The key numbers to remember: adding a server to N nodes moves only ~1/N of keys, and 100–200 virtual nodes provide near-uniform distribution.