High Level Design Series · Building Blocks· Part 8 of 70

Distributed Locking

April 2026 · 18 min read

Why Distributed Locks?

In a single-process application, a mutex or semaphore is enough to coordinate access to shared memory. But the moment your system spans multiple servers, containers, or even data centers, local mutexes are useless—they cannot cross process boundaries. That is where distributed locks come in: a mechanism that allows multiple independent processes to agree that exactly one of them holds exclusive access to a shared resource at any given time.

Real-World Scenarios

Prevent double booking: Two users simultaneously try to reserve the last seat on a flight. Without a distributed lock on that seat inventory row, both could succeed, over-selling the flight.
Single-writer guarantee: A distributed cache rebuilder should only be executed by one node at a time. If two nodes simultaneously rebuild, they might write partial or conflicting data.
Cron job deduplication: In a Kubernetes deployment with N replicas, a scheduled task (send daily emails, archive old records) must run on exactly one pod. A distributed lock ensures only the winner executes.
Leader election: In a replicated database or streaming system, nodes compete for leadership. The winner writes; followers replicate. A distributed lock (or lease) is the foundation of this pattern.
Rate-limited API calls: A third-party API allows one request per second. Multiple workers must coordinate so only one sends a request in each time window.

Key insight: Distributed locks are not about performance—they are about correctness. If two processes both believe they hold the lock, data corruption or business-logic violations can follow.

Properties of a Good Distributed Lock

Not all distributed locks are created equal. A robust implementation must satisfy several properties, formalized in the literature as safety and liveness guarantees:

Property	Type	Description
Mutual Exclusion	Safety	At most one client can hold the lock at any given time. This is the fundamental guarantee. If this breaks, the lock is useless.
Deadlock-Free	Liveness	Even if a client crashes while holding the lock, the lock must eventually be released. Typically enforced via TTL (time-to-live) or session expiration.
Fault Tolerance	Availability	The lock service must remain operational even if some nodes fail. A single-node lock server is a single point of failure.
Fencing Tokens	Safety (extended)	Each lock acquisition produces a monotonically increasing token. Resource servers reject stale tokens to prevent delayed writes from a previous lock holder.

The first three are widely agreed upon. The fourth—fencing tokens—was highlighted by Martin Kleppmann as essential for correctness in asynchronous systems. We will explore it in depth in a dedicated section.

Redis-Based Locking

Redis is the most popular choice for distributed locking because of its low latency and simplicity. The basic pattern uses a single Redis instance:

Acquiring the Lock

# SET key value NX EX ttl
# NX  = only set if key does NOT exist (mutual exclusion)
# EX  = set TTL in seconds (deadlock-freedom)
# value = a unique random string identifying THIS client

SET my_lock "client-a-uuid-123" NX EX 30

# Returns OK if the lock was acquired
# Returns (nil) if someone else already holds it

The NX flag ensures atomicity—only one client wins the race. The EX 30 sets a 30-second TTL so that if the client crashes, the lock auto-releases. The random value identifies the lock owner, preventing accidental release by another client.

Releasing the Lock

You must not simply DEL my_lock. If the TTL expired and another client acquired the lock, you would delete their lock. Instead, use a Lua script for atomic check-and-delete:

-- Lua script: release only if the value matches our unique ID
if redis.call("GET", KEYS[1]) == ARGV[1] then
    return redis.call("DEL", KEYS[1])
else
    return 0
end

-- Execute from client:
EVAL "if redis.call('GET',KEYS[1])==ARGV[1] then return redis.call('DEL',KEYS[1]) else return 0 end" 1 my_lock "client-a-uuid-123"

Problems with Single-Instance Redis Locks

Single point of failure (SPOF): If the Redis instance goes down, all locks are lost. If it comes back with a stale RDB snapshot, it might have "forgotten" about active locks, allowing two clients to believe they hold the same lock.
Replication lag: If using Redis replication, the master might accept a SET NX and crash before replicating to the replica. The replica is promoted, and another client acquires the "same" lock. Now two clients hold the lock.
Clock drift: TTL-based expiration depends on the server clock. Significant clock skew between Redis and clients can cause premature or delayed expiration.
Long GC pauses: A client acquires the lock, then its JVM garbage collector pauses for 30 seconds. The TTL expires, another client acquires the lock, and when the original client resumes, it still thinks it holds the lock.

Bottom line: A single-Redis-instance lock is fine for efficiency purposes (avoiding duplicate work) but is NOT safe for correctness purposes (preventing data corruption). For correctness, you need either Redlock or a consensus-based system.

Redlock Algorithm

Redlock, proposed by Salvatore Sanfilippo (Antirez), addresses the single-instance weakness by using N independent Redis instances (typically 5). The algorithm:

Step-by-Step

Record the start time (T1) in milliseconds.
Sequentially try to acquire the lock on all N Redis instances using the same key, same random value, and the same TTL. Use a small per-instance timeout (e.g., 5–10ms) so that a down instance doesn't slow you down.
Calculate elapsed time: elapsed = now() - T1.
The lock is acquired if and only if:
- The client acquired the lock on a majority of instances (at least ⌊N/2⌋ + 1, so 3 out of 5).
- The total elapsed time is less than the lock TTL. The effective lock validity is TTL - elapsed.
If the lock was NOT acquired (fewer than majority or timeout exceeded), release the lock on ALL instances (even the ones where it was acquired) to avoid deadlocks.

# Pseudocode for Redlock acquisition
import time, uuid, redis

LOCK_KEY = "my_resource"
LOCK_TTL = 30000  # 30 seconds in ms
QUORUM = 3        # majority of 5
RETRY_DELAY = 200 # ms between retries
instances = [redis.Redis(host=h) for h in REDIS_HOSTS]  # 5 hosts

def acquire_lock():
    value = str(uuid.uuid4())
    start = time.monotonic_ns() // 1_000_000  # ms

    acquired = 0
    for r in instances:
        try:
            if r.set(LOCK_KEY, value, nx=True, px=LOCK_TTL):
                acquired += 1
        except redis.ConnectionError:
            pass  # instance down, skip

    elapsed = (time.monotonic_ns() // 1_000_000) - start
    validity = LOCK_TTL - elapsed

    if acquired >= QUORUM and validity > 0:
        return value, validity  # lock acquired!
    else:
        # Release on ALL instances
        for r in instances:
            try:
                release_lock(r, value)
            except:
                pass
        return None, 0

def release_lock(r, value):
    lua = """
    if redis.call("GET",KEYS[1])==ARGV[1] then
        return redis.call("DEL",KEYS[1])
    else return 0 end
    """
    r.eval(lua, 1, LOCK_KEY, value)

▶ Redlock Algorithm

Step through lock acquisition across 5 independent Redis nodes. Majority (3/5) required.

The Kleppmann vs Antirez Debate

In 2016, Martin Kleppmann published a widely-read analysis titled "How to do distributed locking" that criticized Redlock. Salvatore Sanfilippo (creator of Redis) responded with a rebuttal. This debate remains one of the most important discussions in distributed systems:

Kleppmann’s Criticisms	Antirez’s Rebuttals
GC pauses break safety. A client acquires the lock, then a GC pause of >TTL causes the lock to expire. Another client acquires it. The first client resumes and operates on shared data, believing it still holds the lock. Two clients now have "the lock."	The client checks the remaining lock validity after acquiring it. If the GC pause happened during acquisition, the validity check catches it. However, Kleppmann argues the pause can happen after the check.
Clock jumps break safety. Redlock relies on wall-clock time for TTL calculations. If an NTP correction jumps a Redis server’s clock forward, the lock expires prematurely.	Use monotonic clocks where possible. Redis uses `gettimeofday()` which can be affected by NTP, but in practice large jumps are rare with well-configured NTP (smearing, not stepping).
No fencing tokens. Redlock does not produce monotonically increasing tokens. Without fencing, even correctly implemented locks cannot prevent split-brain writes.	If you need fencing tokens, you need a consensus system anyway. Redlock is designed for a different (valid) use case where strong fencing isn’t required.
Use a proper consensus system. If you need a lock for correctness, use ZooKeeper, etcd, or Chubby—systems designed with formal correctness proofs and linearizable semantics.	Redlock provides a practical middle ground. Not everyone needs the operational complexity of ZooKeeper. For many use cases, Redlock’s guarantees are sufficient.

Verdict: Kleppmann’s argument is widely considered technically correct: for correctness-critical locks, Redlock is insufficient because it lacks fencing tokens and relies on timing assumptions. Use Redlock for efficiency (preventing duplicate work where occasional double-execution is tolerable). For correctness, use a consensus-based system with fencing.

ZooKeeper Locks

Apache ZooKeeper provides a more reliable locking primitive built on top of its consensus protocol (ZAB — ZooKeeper Atomic Broadcast). Unlike Redis, ZooKeeper was designed for coordination tasks.

How It Works: Ephemeral Sequential Znodes

Create an ephemeral sequential znode under a lock path:
```
# Client creates an ephemeral sequential node
create -e -s /locks/my_resource/lock-
# ZooKeeper returns: /locks/my_resource/lock-0000000001
```
-e = ephemeral (auto-deleted when session ends), -s = sequential (ZooKeeper appends a monotonically increasing counter).

List all children of the lock path:

ls /locks/my_resource
# Returns: [lock-0000000001, lock-0000000003, lock-0000000005]

Check if you have the lowest sequence number:
- Yes: You hold the lock. Proceed with your critical section.
- No: Set a watch on the znode with the next-lower sequence number (not the lowest — this avoids the "herd effect" where all waiters wake up simultaneously).
When the watched znode is deleted (previous holder releases or crashes), you are notified. Re-check if you now have the lowest number.
Release: Simply delete your znode, or let your session expire (ZooKeeper automatically deletes ephemeral nodes).

# Pseudocode: ZooKeeper distributed lock (using Kazoo Python client)
from kazoo.client import KazooClient
from kazoo.recipe.lock import Lock

zk = KazooClient(hosts='zk1:2181,zk2:2181,zk3:2181')
zk.start()

lock = Lock(zk, "/locks/my_resource")

# Blocking acquire
lock.acquire(timeout=30)
try:
    # Critical section: only one client executes this at a time
    process_shared_resource()
finally:
    lock.release()

# Or use as context manager
with Lock(zk, "/locks/my_resource"):
    process_shared_resource()

Advantages Over Redis

Consensus-backed: ZAB protocol ensures all nodes agree on the state of the lock. No split-brain from replication lag.
Automatic cleanup: Ephemeral nodes are deleted when the client session dies. No TTL guessing game.
Monotonic ordering: Sequential znodes provide a natural ordering that can be used as fencing tokens.
No clock dependency: Lock correctness does not depend on synchronized clocks.
Watch mechanism: Clients are notified when the lock becomes available, rather than polling.

Drawbacks

Operational complexity: Running a ZooKeeper ensemble (3 or 5 nodes) requires significant operational expertise.
Higher latency: Consensus requires quorum writes, which are slower than a single Redis SET.
Session management: If a client’s session expires due to a temporary network partition, the lock is released even though the client is still running (and might still be accessing the shared resource).

etcd Locks

etcd uses the Raft consensus algorithm and provides distributed locking through leases and transactions. It is the coordination backbone of Kubernetes.

Lease-Based Locks

# Grant a lease (TTL = 30 seconds)
etcdctl lease grant 30
# Output: lease 694d81417acd0b17 granted with TTL(30s)

# Put a key with the lease attached
etcdctl put --lease=694d81417acd0b17 /locks/my_resource "client-a"

# The key auto-deletes when the lease expires
# Keep the lease alive with periodic keep-alives:
etcdctl lease keep-alive 694d81417acd0b17

Transactions for Atomic Lock Acquisition

# Atomic compare-and-set: only create the lock if it doesn't exist
etcdctl txn --interactive
# compares:
create_revision("/locks/my_resource") = "0"
# success requests:
put /locks/my_resource "client-a" --lease=694d81417acd0b17
# failure requests:
get /locks/my_resource

The transaction checks that the key’s create_revision is 0 (key doesn’t exist). If true, it creates the key with the lease. If false, it returns the current holder’s value. This is atomic and linearizable.

Using etcd’s Built-in Lock Command

# etcd provides a high-level lock command
etcdctl lock /locks/my_resource
# Blocks until the lock is acquired
# Holds the lock until the process exits or is interrupted (Ctrl+C)

# Execute a command while holding the lock
etcdctl lock /locks/my_resource -- echo "I have the lock"

Programmatic Example (Go)

import (
    "context"
    "go.etcd.io/etcd/client/v3"
    "go.etcd.io/etcd/client/v3/concurrency"
)

cli, _ := clientv3.New(clientv3.Config{
    Endpoints: []string{"etcd1:2379", "etcd2:2379", "etcd3:2379"},
})
defer cli.Close()

// Create a session (lease-based)
session, _ := concurrency.NewSession(cli, concurrency.WithTTL(30))
defer session.Close()

// Create a mutex
mu := concurrency.NewMutex(session, "/locks/my_resource")

// Acquire
mu.Lock(context.TODO())
defer mu.Unlock(context.TODO())

// Critical section
processSharedResource()

// The lock's revision can serve as a fencing token:
fmt.Println("Fencing token:", mu.Header().Revision)

The mu.Header().Revision is a monotonically increasing number from etcd’s Raft log. This makes etcd locks compatible with fencing token requirements out of the box.

Fencing Tokens

Fencing tokens are the critical concept that separates a distributed lock that is merely "good enough" from one that is provably correct. This idea, popularized by Martin Kleppmann, addresses a fundamental problem in asynchronous distributed systems.

The Problem: Zombie Lock Holders

Consider this scenario without fencing:

Client A acquires the lock with TTL=30s.
Client A begins processing, but hits a long GC pause (or network delay).
The lock’s TTL expires after 30s. Client A is still paused.
Client B acquires the lock (legitimately).
Client B writes to the shared resource.
Client A’s GC pause ends. It still believes it holds the lock.
Client A writes to the shared resource, overwriting Client B’s data.

The lock’s mutual exclusion guarantee was violated. Both clients wrote to the resource "at the same time" from the resource’s perspective.

The Solution: Monotonically Increasing Tokens

Each time a lock is acquired, the lock service issues a fencing token—a monotonically increasing number (e.g., ZooKeeper’s zxid, etcd’s revision). The resource server (database, storage, API) validates this token:

# Lock service issues tokens:
# Client A acquires lock → token = 33
# Client B acquires lock → token = 34

# Resource server tracks the highest token it has seen:
highest_token_seen = 0

def write(data, token):
    if token <= highest_token_seen:
        raise RejectedError(f"Stale token {token} < {highest_token_seen}")
    highest_token_seen = token
    actually_write(data)

# Client A (token=33) writes first  → accepted, highest=33
# Client B (token=34) writes        → accepted, highest=34
# Client A (delayed, token=33) tries → REJECTED (33 < 34)

▶ Distributed Lock with Fencing

Watch how fencing tokens prevent a zombie lock holder from corrupting data after a GC pause.

Where Fencing Tokens Come From

Lock Service	Fencing Token Source
ZooKeeper	Sequential znode number or `zxid` (transaction ID)
etcd	`revision` from the Raft log (returned by `mu.Header().Revision`)
Chubby (Google)	Lock sequencer (built-in)
Redis / Redlock	⚠ Not natively supported. Requires external sequencing.

Fencing requires cooperation from the resource server. It must understand and enforce token ordering. This is straightforward for databases (WHERE token > last_seen_token) but harder for arbitrary APIs or third-party services.

Database-Based Locks

If your application already uses a relational database, you can leverage it as a lock service. This avoids adding a new infrastructure dependency, though it comes with trade-offs.

SELECT FOR UPDATE (Row-Level Locking)

-- Create a locks table
CREATE TABLE distributed_locks (
    lock_name   VARCHAR(255) PRIMARY KEY,
    locked_by   VARCHAR(255),
    locked_at   TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    expires_at  TIMESTAMP
);

-- Attempt to acquire (within a transaction)
BEGIN;
SELECT * FROM distributed_locks
    WHERE lock_name = 'my_resource'
    FOR UPDATE SKIP LOCKED;

-- If the row is returned, check expiration and update
UPDATE distributed_locks
    SET locked_by = 'client-a',
        locked_at = NOW(),
        expires_at = NOW() + INTERVAL '30 seconds'
    WHERE lock_name = 'my_resource'
      AND (expires_at IS NULL OR expires_at < NOW());
COMMIT;

-- Release
UPDATE distributed_locks
    SET locked_by = NULL, expires_at = NULL
    WHERE lock_name = 'my_resource'
      AND locked_by = 'client-a';

PostgreSQL Advisory Locks

PostgreSQL provides dedicated advisory lock functions that don’t require a table:

-- Session-level advisory lock (released when session ends)
SELECT pg_advisory_lock(12345);
-- ... critical section ...
SELECT pg_advisory_unlock(12345);

-- Try without blocking (returns true/false)
SELECT pg_try_advisory_lock(12345);

-- Transaction-level advisory lock (released at COMMIT/ROLLBACK)
SELECT pg_advisory_xact_lock(12345);

-- Using two integers for namespacing
SELECT pg_advisory_lock(1, 42);  -- namespace=1, resource=42

MySQL GET_LOCK

-- Acquire (timeout 30 seconds)
SELECT GET_LOCK('my_resource', 30);
-- Returns 1 if acquired, 0 if timed out, NULL on error

-- Release
SELECT RELEASE_LOCK('my_resource');

-- Check without acquiring
SELECT IS_FREE_LOCK('my_resource');

Pros and Cons

Pros	Cons
No additional infrastructure Transactional semantics (ACID) Works with your existing HA setup	Database becomes a bottleneck at scale Lock contention increases query latency Not designed for high-throughput locking Connection pool exhaustion under heavy locking

Lock-Free Alternatives

Sometimes the best distributed lock is no lock at all. Several patterns avoid the need for explicit locking:

Optimistic Locking (Version Numbers)

Instead of preventing concurrent access, allow it but detect conflicts at write time:

-- Read the current version
SELECT data, version FROM resources WHERE id = 42;
-- Returns: data='old_value', version=7

-- Update only if the version hasn't changed
UPDATE resources
    SET data = 'new_value', version = version + 1
    WHERE id = 42 AND version = 7;

-- Check affected rows:
-- 1 row  → success, your write went through
-- 0 rows → conflict! Someone else updated first. Retry.

This is the most widely used concurrency control mechanism in web applications. It avoids the overhead and complexity of distributed locks entirely for most use cases.

Compare-and-Swap (CAS)

CAS is the atomic primitive underlying optimistic locking. Many distributed stores support it natively:

# etcd: atomic CAS using transactions
etcdctl txn --interactive
# compares:
value("/config/setting") = "old_value"
# success:
put /config/setting "new_value"
# failure:
get /config/setting

# DynamoDB: conditional write
table.put_item(
    Item={'id': '42', 'data': 'new_value', 'version': 8},
    ConditionExpression='version = :v',
    ExpressionAttributeValues={':v': 7}
)
# Raises ConditionalCheckFailedException if version != 7

CRDTs (Conflict-Free Replicated Data Types)

CRDTs are data structures that can be independently updated on different nodes and merged without conflicts. They eliminate the need for locking by making all operations commutative:

G-Counter: Each node maintains its own counter. The global count is the sum. Increments on any node are always safe.
OR-Set (Observed-Remove Set): Elements can be added/removed concurrently. Conflicts resolve deterministically.
LWW-Register (Last-Writer-Wins): Each write is timestamped. The latest timestamp wins on merge.

When to Avoid Locks

Use Locks When	Avoid Locks When
Strict mutual exclusion is required Operations are not idempotent External side effects (sending email, charging a card) Leader election	Operations are idempotent (safe to retry) Optimistic concurrency works (low contention) CRDTs can model your data Eventual consistency is acceptable

Common Pitfalls

1. Lock Holder Dies Without Releasing

This is the most common pitfall. A client acquires the lock, crashes (segfault, OOM, power failure), and never releases it.

Mitigation: Always use TTLs (Redis), leases (etcd), or ephemeral nodes (ZooKeeper). Never use a lock without an automatic expiration mechanism.
Anti-pattern: Using a database row as a lock without an expires_at column. If the process dies, the lock is held forever until manual intervention.

2. Clock Skew Between Nodes

If your lock’s correctness depends on time (TTL, lease duration), clock differences between machines can cause problems:

A Redis server’s clock jumps forward by 10s → lock expires 10s early.
A client’s clock is 5s behind → it calculates lock validity incorrectly.

Mitigation: Use monotonic clocks for duration calculations (clock_gettime(CLOCK_MONOTONIC) or time.monotonic()). Keep NTP well-configured with slew mode (gradual adjustment) rather than step mode (sudden jumps).

3. Network Partitions During Lock Hold

A client acquires the lock, then a network partition isolates it from the lock server:

The lock server considers the client dead and releases the lock.
Another client on the other side of the partition acquires the lock.
The original client is still running and still thinks it holds the lock.

Mitigation: Fencing tokens. The resource server will reject the original client’s stale token.

4. GC Pauses Extending Lock Hold Time

This is particularly insidious in JVM-based systems. A stop-the-world GC pause can last seconds or even minutes in pathological cases:

// Timeline:
t=0s    Client A acquires lock (TTL=30s, token=33)
t=1s    Client A starts processing
t=2s    Client A hits a full GC pause ← FROZEN
t=32s   Lock expires (30s TTL)
t=33s   Client B acquires lock (token=34)
t=34s   Client B writes to resource
t=45s   Client A's GC ends ← RESUMES, thinks it has the lock!
t=46s   Client A writes to resource with token=33
        → MUST BE REJECTED by resource server

Mitigation: Fencing tokens (again). Also: tune GC to minimize pause times (G1GC, ZGC, Shenandoah), use shorter TTLs with frequent renewals, and always check lock validity before performing the critical operation.

5. Forgetting to Release the Lock

# BAD: Exception between acquire and release = lock held until TTL
lock.acquire()
process()        # ← if this throws, release() is never called
lock.release()

# GOOD: Use try/finally or context managers
lock.acquire()
try:
    process()
finally:
    lock.release()

# BEST: Context manager (Python) or defer (Go)
with lock:
    process()

6. Retry Storms

If all clients retry immediately after failing to acquire the lock, they create a thundering herd effect that overwhelms the lock server.

Mitigation: Use exponential backoff with jitter:

import random, time

def acquire_with_backoff(lock, max_retries=10):
    for attempt in range(max_retries):
        if lock.acquire(blocking=False):
            return True
        # Exponential backoff with jitter
        wait = min(2 ** attempt * 0.1, 5.0)  # cap at 5s
        wait *= (0.5 + random.random())       # add jitter
        time.sleep(wait)
    return False

Summary: Choosing a Distributed Lock

Technology	Best For	Fencing?	Complexity
Redis (single)	Efficiency, deduplication	✘	Low
Redlock	Better efficiency, no SPOF	✘	Medium
ZooKeeper	Correctness-critical locks	✔	High
etcd	Kubernetes-native, correctness	✔	Medium–High
Database	Simple apps, existing DB	✔ (via version)	Low
No lock (CAS/CRDT)	High throughput, idempotent ops	N/A	Varies