Vector Clocks & Logical Time
The Problem with Physical Clocks
In a single-process system, ordering events is trivial: the CPU executes instructions sequentially, and the system clock provides a monotonically increasing timestamp. But in a distributed system — where processes run on separate machines with separate clocks — determining "which event happened first?" becomes one of the most fundamental and deceptively difficult problems in computer science.
Physical clocks in distributed systems are unreliable for ordering events for several reasons:
- Clock Drift: Quartz oscillators in commodity hardware drift at rates of 10–200 parts per million (ppm). A 100ppm drift means a clock can gain or lose 8.6 seconds per day. Over a week, two machines can disagree by a full minute.
- NTP Synchronization Limits: The Network Time Protocol (NTP) can synchronize clocks to within 1–50ms over the internet (and ~0.5ms on a LAN), but it cannot guarantee perfect synchronization. NTP corrections can cause clocks to jump forward or backward.
- Leap Seconds: UTC occasionally inserts leap seconds to account for Earth's irregular rotation. On June 30, 2012, a leap second caused widespread outages at Reddit, LinkedIn, and Foursquare because Linux kernels handled the extra second incorrectly.
- Clock Skew: At any given instant, the clocks on two different machines show different times. This skew makes it impossible to determine causal ordering from timestamps alone.
- VM and Container Clock Issues: Virtual machines and containers may have their clocks paused during live migration or snapshotting, causing sudden jumps when resumed.
Machine A clock: 10:00:00.000
Machine B clock: 10:00:00.035 (35ms ahead)
Event e1 on Machine A at local time 10:00:00.020
Event e2 on Machine B at local time 10:00:00.010
Physical timestamp says: e2 (10:00:00.010) happened before e1 (10:00:00.020)
Reality (after correcting for 35ms skew):
e2 real time ≈ 09:59:59.975
e1 real time = 10:00:00.020
e1 actually happened AFTER e2!
Physical clocks gave us the WRONG ordering.
This insight led Lamport to develop the first formal framework for reasoning about event ordering without relying on physical clocks — using logical time.
The Happens-Before Relation
Before we discuss any clock mechanism, we need a formal definition of causality in distributed systems. Lamport introduced the happens-before relation (denoted →) as the fundamental ordering primitive.
Definition: Happens-Before (→)
The relation → on the set of events in a distributed system is the smallest relation satisfying three conditions:
- Process Order: If events a and b occur on the same process, and a comes before b in the process's execution, then a → b.
- Message Causality: If a is the sending of a message by one process and b is the receipt of that same message by another process, then a → b.
- Transitivity: If a → b and b → c, then a → c.
Definition: Concurrent Events (∥)
Two distinct events a and b are concurrent (written a ∥ b) if and only if:
¬(a → b) ∧ ¬(b → a)
Neither event causally precedes the other. They are independent — there is no chain of messages linking them. In a concurrent world, both orderings (a before b, or b before a) are equally valid.
This relation defines a partial order on events. Not all events are comparable — concurrent events have no defined ordering, and that's a feature, not a bug. Forcing a total order on concurrent events loses information about the system's actual causal structure.
The key insight: causality, not time, is what matters. If event a could not have influenced event b (no causal chain between them), then the relative ordering of a and b is irrelevant to system correctness.
Lamport Timestamps
Lamport timestamps (also called Lamport clocks or scalar logical clocks) are the simplest mechanism for capturing causal ordering. Each process maintains a single integer counter.
Algorithm: Lamport Timestamp
Each process Pi maintains a counter Ci, initially 0.
- Internal Event: Before executing an internal event, Pi increments its counter: Ci = Ci + 1
- Send Event: Before sending a message m, Pi increments its counter: Ci = Ci + 1, then attaches Ci to m as the message timestamp tm.
- Receive Event: Upon receiving a message m with timestamp tm, Pi updates: Ci = max(Ci, tm) + 1
# Lamport Clock implementation
class LamportClock:
def __init__(self):
self.time = 0
def tick(self):
"""Internal event or send event"""
self.time += 1
return self.time
def receive(self, msg_timestamp):
"""Receive event — merge then tick"""
self.time = max(self.time, msg_timestamp) + 1
return self.time
# Example: Three processes
A = LamportClock() # Process A
B = LamportClock() # Process B
C = LamportClock() # Process C
# A does local work
a1 = A.tick() # A: time=1 (event a1)
# A sends message to B
a2 = A.tick() # A: time=2 (send event a2)
b1 = B.receive(a2) # B: time=3 (receive, max(0,2)+1=3)
# B does local work
b2 = B.tick() # B: time=4 (event b2)
# B sends message to C
b3 = B.tick() # B: time=5 (send event b3)
c1 = C.receive(b3) # C: time=6 (receive, max(0,5)+1=6)
# C does local work independently
c2 = C.tick() # C: time=7 (event c2)
# Meanwhile, A also does work independently
a3 = A.tick() # A: time=3 (event a3 — concurrent with c2!)
Properties of Lamport Timestamps
Lamport timestamps satisfy the clock condition:
Clock Condition: If a → b, then C(a) < C(b)
Where C(e) is the Lamport timestamp of event e.
IMPORTANT: The converse is NOT true!
C(a) < C(b) does NOT imply a → b
Example:
Event a3 on A has timestamp 3
Event c1 on C has timestamp 6
C(a3) = 3 < 6 = C(c1)
But a3 and c1 might be concurrent!
(If a3 happened after a2, and there's no message
chain from a3 to c1, they are concurrent despite
different timestamps.)
This asymmetry is the fundamental limitation of Lamport timestamps. They provide a necessary but insufficient condition for causal ordering:
| Statement | Truth Value | Explanation |
|---|---|---|
a → b ⟹ C(a) < C(b) |
✓ Always true | If a causally precedes b, a's timestamp is always smaller |
C(a) < C(b) ⟹ a → b |
✗ Not necessarily | A smaller timestamp does NOT imply causation — events could be concurrent |
C(a) = C(b) ⟹ a ∥ b |
✗ Not necessarily | Equal timestamps don't guarantee concurrency (different processes may coincidentally assign same value) |
a ∥ b ⟹ ??? |
✗ Cannot detect | Lamport timestamps cannot detect concurrency at all |
▶ Lamport Timestamp Animation
Watch how Lamport timestamps increment — and where they fail to detect concurrency.
Establishing Total Order with Lamport Timestamps
While Lamport timestamps only give a partial order, you can create a total order by using process IDs as a tiebreaker. This is useful for mutual exclusion and total-order broadcast:
Total Order: (C(a), P_a) < (C(b), P_b)
iff C(a) < C(b), or
C(a) = C(b) and P_a < P_b
This gives every event a unique, totally ordered position.
But it still doesn't capture concurrency —
it just arbitrarily breaks ties.
Example:
Event e1 at (timestamp=5, process=A) → rank (5, A)
Event e2 at (timestamp=5, process=B) → rank (5, B)
Total order says e1 < e2 (because A < B),
but e1 and e2 might be causally unrelated.
Vector Clocks
Vector clocks, introduced independently by Fidge (1988) and Mattern (1989), solve the fundamental limitation of Lamport timestamps. Instead of a single counter, each process maintains a vector of counters — one entry for every process in the system.
Definition: Vector Clock
For a system with N processes {P1, P2, ..., PN}, each process Pi maintains a vector VCi[1..N] where:
- VCi[i] = the number of events that have occurred at Pi (own logical clock)
- VCi[j] = the latest event count of Pj that Pi knows about (last known time at Pj)
Algorithm: Vector Clock Update Rules
- Internal Event at Pi: Increment own entry: VCi[i] = VCi[i] + 1
- Send Event at Pi: Increment own entry: VCi[i] = VCi[i] + 1. Attach entire vector VCi to the message.
- Receive Event at Pi (message from Pj with vector VCmsg):
- Merge: For each k, VCi[k] = max(VCi[k], VCmsg[k])
- Then increment own entry: VCi[i] = VCi[i] + 1
# Vector Clock implementation
class VectorClock:
def __init__(self, process_id, num_processes):
self.pid = process_id
self.clock = [0] * num_processes
def tick(self):
"""Internal event or before sending"""
self.clock[self.pid] += 1
return list(self.clock) # return a copy
def send(self):
"""Send event — tick and return vector to attach"""
self.clock[self.pid] += 1
return list(self.clock)
def receive(self, msg_vector):
"""Receive event — merge then tick"""
for i in range(len(self.clock)):
self.clock[i] = max(self.clock[i], msg_vector[i])
self.clock[self.pid] += 1
return list(self.clock)
def __repr__(self):
return f"VC({self.clock})"
# Example: 3-process system
A = VectorClock(0, 3) # Process A (index 0)
B = VectorClock(1, 3) # Process B (index 1)
C = VectorClock(2, 3) # Process C (index 2)
# A does local work
a1 = A.tick() # A: [1, 0, 0]
# A sends message to B
a2 = A.send() # A: [2, 0, 0] — attached to message
b1 = B.receive(a2) # B: [2, 1, 0] — merge [0,0,0] with [2,0,0], then +1 at B
# B sends message to C
b2 = B.send() # B: [2, 2, 0] — attached to message
c1 = C.receive(b2) # C: [2, 2, 1] — merge [0,0,0] with [2,2,0], then +1 at C
# Now: A does work independently, C does work independently
a3 = A.tick() # A: [3, 0, 0]
c2 = C.tick() # C: [2, 2, 2]
# Can we compare a3 and c2?
# a3 = [3, 0, 0] c2 = [2, 2, 2]
# a3[0]=3 > c2[0]=2 but a3[1]=0 < c2[1]=2
# INCOMPARABLE → a3 ∥ c2 (concurrent!)
Comparing Vector Clocks
The power of vector clocks comes from their comparison rules, which precisely capture the happens-before relation:
Definition: Vector Clock Ordering
Given two vector timestamps V and W of dimension N:
- V = W iff ∀k: V[k] = W[k] (identical vectors)
- V ≤ W iff ∀k: V[k] ≤ W[k] (every component of V is ≤ corresponding component of W)
- V < W iff V ≤ W ∧ V ≠ W (V is dominated by W — at least one component is strictly less)
- V ∥ W (concurrent) iff ¬(V ≤ W) ∧ ¬(W ≤ V) (neither dominates the other)
# Comparison functions
def vc_leq(v, w):
"""v ≤ w: every component of v is ≤ w"""
return all(v[i] <= w[i] for i in range(len(v)))
def vc_lt(v, w):
"""v < w: v ≤ w and at least one component strictly less"""
return vc_leq(v, w) and v != w
def vc_concurrent(v, w):
"""v ∥ w: neither dominates the other"""
return not vc_leq(v, w) and not vc_leq(w, v)
def relationship(v, w):
if v == w:
return "EQUAL"
if vc_lt(v, w):
return f"{v} → {w} (happens-before)"
if vc_lt(w, v):
return f"{w} → {v} (happens-before)"
return f"{v} ∥ {w} (CONCURRENT)"
# Examples:
print(relationship([2, 0, 0], [2, 2, 1])) # [2,0,0] → [2,2,1]
print(relationship([3, 0, 0], [2, 2, 2])) # CONCURRENT!
print(relationship([1, 2, 0], [1, 1, 3])) # CONCURRENT!
print(relationship([1, 2, 3], [1, 3, 3])) # [1,2,3] → [1,3,3]
The Vector Clock Theorem
The crucial theorem that makes vector clocks strictly more powerful than Lamport timestamps:
Theorem (Fidge-Mattern)
For any two events a and b in a distributed computation:
a → b ⟺ VC(a) < VC(b)
This is an if and only if — a biconditional. Compare this with Lamport timestamps, which only provide the forward implication. Vector clocks are a faithful representation of the happens-before partial order.
This means:
- If VC(a) < VC(b), then a causally precedes b — guaranteed.
- If VC(a) ∥ VC(b), then a and b are concurrent — guaranteed.
- There are no false positives and no false negatives in causal ordering detection.
▶ Vector Clock Animation
Step through events across 3 nodes. Watch vectors update and see how concurrent events are detected.
Limitations of Vector Clocks
Vector clocks are theoretically elegant but have practical limitations:
- Space Complexity: Each vector has N entries where N is the number of processes. In a system with thousands of nodes, every message carries a multi-kilobyte vector. For a system with 1000 nodes and 8-byte counters, each vector is 8KB.
- Dynamic Membership: Adding or removing processes requires resizing all vectors across the system. In practice, this means you need a membership protocol coordinating vector size changes.
- Garbage Collection: Entries for departed processes must be retained to correctly compare historical vectors. Pruning requires careful coordination.
Storage overhead comparison:
Lamport Timestamp: O(1) per event, O(1) per message
Vector Clock: O(N) per event, O(N) per message
N = 3 nodes: 3 integers per message → negligible
N = 100 nodes: 100 integers per message → ~800 bytes
N = 10,000 nodes: 10,000 integers → ~80 KB per message!
This is why real systems use optimizations like
version vectors, dotted version vectors, or
interval tree clocks.
Version Vectors for Data Versioning
In practice, distributed databases don't track every event — they track data versions. A version vector is a specialization of vector clocks used specifically for tracking causal histories of data items.
While a vector clock tracks events at a per-process granularity, a version vector tracks updates to a specific data item. Each time a replica updates a data item, it increments its own entry in the version vector associated with that item.
# Version Vector for a replicated key-value store
# Each key has its own version vector
class VersionedStore:
def __init__(self, replica_id, num_replicas):
self.rid = replica_id
self.data = {} # key → (value, version_vector)
def put(self, key, value):
"""Local write — increment own version"""
if key in self.data:
_, vv = self.data[key]
else:
vv = {}
vv[self.rid] = vv.get(self.rid, 0) + 1
self.data[key] = (value, dict(vv))
return dict(vv)
def receive_update(self, key, value, remote_vv):
"""Receive a replicated write from another replica"""
if key not in self.data:
# No local version — accept remote
self.data[key] = (value, dict(remote_vv))
return "APPLIED"
_, local_vv = self.data[key]
rel = compare_vv(local_vv, remote_vv)
if rel == "LESS":
# Remote is strictly newer — overwrite
self.data[key] = (value, dict(remote_vv))
return "APPLIED"
elif rel == "GREATER":
# Local is strictly newer — discard remote
return "DISCARDED"
elif rel == "CONCURRENT":
# Both versions are valid! → CONFLICT
return "CONFLICT"
else:
# Equal — idempotent duplicate
return "DUPLICATE"
def compare_vv(a, b):
"""Compare two version vectors"""
all_keys = set(a.keys()) | set(b.keys())
a_leq_b = all(a.get(k, 0) <= b.get(k, 0) for k in all_keys)
b_leq_a = all(b.get(k, 0) <= a.get(k, 0) for k in all_keys)
if a_leq_b and b_leq_a:
return "EQUAL"
elif a_leq_b:
return "LESS"
elif b_leq_a:
return "GREATER"
else:
return "CONCURRENT"
Conflict Detection Example
Scenario: Shopping cart in a replicated store (3 replicas: R1, R2, R3)
Initial state at all replicas:
cart = {items: ["milk"]}
version: {R1:1} ← R1 originally wrote this
Client A writes to R1:
cart = {items: ["milk", "eggs"]}
version: {R1:2} ← R1 increments its entry
Client B writes to R2 (hasn't seen A's write yet):
cart = {items: ["milk", "bread"]}
version: {R1:1, R2:1} ← R2 increments its entry, keeps R1:1
Now R1 and R2 try to sync:
R1 version: {R1:2}
R2 version: {R1:1, R2:1}
R1:2 vs R1:1 → R1 has higher R1 count
R2:1 vs R2:0 → R2 has higher R2 count
→ CONCURRENT! Neither dominates.
→ CONFLICT DETECTED!
Resolution options:
1. Return both versions to client (Amazon's shopping cart approach)
2. Merge automatically: {items: ["milk", "eggs", "bread"]}
3. Last-write-wins (lose one update — dangerous!)
4. Application-specific conflict resolution
Dotted Version Vectors
Standard version vectors have a subtle problem called sibling explosion. When a client reads a conflicting value (with multiple siblings) and then writes back a resolved version, the version vector may not correctly subsume the old siblings, causing them to re-appear. This is known as the false concurrency problem.
Riak (a distributed key-value store) solved this with dotted version vectors (DVVs), introduced by Preguiça et al. (2012).
Problem with plain version vectors:
Client reads key "cart" from R1, sees conflict:
Sibling 1: {items: ["milk", "eggs"]}, vv: {R1:2}
Sibling 2: {items: ["milk", "bread"]}, vv: {R1:1, R2:1}
Client resolves and writes back to R1:
Resolved: {items: ["milk", "eggs", "bread"]}
With plain version vector, R1 increments:
New vv: {R1:3} ← But this doesn't dominate {R1:1, R2:1}!
Compare {R1:3} vs {R1:1, R2:1}:
R1: 3 > 1 ✓
R2: 0 < 1 ✗ → STILL CONCURRENT!
Sibling 2 comes back as a zombie! 💀
Solution: Dotted Version Vector
A DVV = (version_vector, dot)
The "dot" records the exact (replica, counter) pair of the event
that created this version.
When resolving, the DVV tracks which specific versions were
seen by the client, allowing the system to correctly subsume them.
Definition: Dotted Version Vector
A dotted version vector is a pair (V, d) where:
- V is a version vector (the "causal context" — what the client had seen)
- d = (r, n) is a "dot" — a single (replica, counter) pair identifying the exact event that produced this value
When comparing, the dot allows the system to determine that a new write descends from (subsumes) specific siblings, even if the version vector alone doesn't dominate them.
# Riak's approach with DVVs (simplified)
DVV structure: { context: {R1:2, R2:1}, dot: (R1, 3) }
Reading: Client gets ALL siblings + their DVVs
Client's context = merge of all sibling version vectors
Writing: Client sends back:
- New value
- Context (merged VV from what client read)
Server creates new DVV:
- context = client's context
- dot = (coordinating_replica, next_counter)
Any sibling whose dot is dominated by the
context is now correctly pruned.
This eliminates the sibling explosion problem.
DynamoDB & Last-Write-Wins
Amazon's original Dynamo paper (2007) used vector clocks for conflict detection, famously implementing the "shopping cart" example where concurrent writes are returned to the client as siblings for manual resolution. However, this approach had significant operational complexity.
DynamoDB (the managed service, launched 2012) took a radically different approach: last-write-wins (LWW) using physical timestamps.
DynamoDB's LWW approach:
Write A at time T1: {key: "user:1", value: "Alice", ts: 1000}
Write B at time T2: {key: "user:1", value: "Bob", ts: 1003}
If T2 > T1, Write B wins. Write A is silently discarded.
No conflicts. No siblings. No client-side resolution.
Simple. Fast. But potentially lossy.
LWW Trade-offs
| Aspect | Vector Clocks (Dynamo-style) | LWW (DynamoDB-style) |
|---|---|---|
| Conflict Detection | Detects all concurrent writes accurately | Cannot detect concurrent writes — one is silently dropped |
| Data Loss | No data loss — all concurrent versions preserved | Potential data loss on every concurrent write |
| Client Complexity | Clients must handle conflict resolution logic | No conflict resolution needed — simple read/write API |
| Metadata Overhead | O(N) vector per data item (N = number of writers) | Single timestamp per item — negligible overhead |
| Read Latency | May return multiple versions; client must reconcile | Always returns single value; fast reads |
| Correctness | Preserves causal consistency guarantees | Relies on physical clock accuracy (NTP) |
| Use Case Fit | Shopping carts, collaborative editing, CRDTs | Session stores, caches, non-critical metadata, counters |
When LWW is Dangerous
Scenario: Bank account balance with LWW
Initial balance: $1000
Client A (at T=100): Deposits $200
Reads balance: $1000
Writes: $1200 with timestamp 100
Client B (at T=101): Withdraws $300
Reads balance: $1000 (hasn't seen A's write)
Writes: $700 with timestamp 101
LWW result: B wins (timestamp 101 > 100)
Final balance: $700
Expected balance: $1000 + $200 - $300 = $900
LOST: $200 deposit from Client A!
This is why financial systems NEVER use LWW.
They use serializable transactions or CRDTs.
Conflict Resolution Strategies
When vector clocks or version vectors detect concurrent writes, the system must resolve the conflict. There are several strategies, each with different trade-offs:
Strategy Comparison
| Strategy | How It Works | Pros | Cons | Used By |
|---|---|---|---|---|
| Last-Write-Wins | Highest timestamp wins; discard others | Simple, no client logic | Data loss on concurrent writes | DynamoDB, Cassandra (default) |
| Client Resolution | Return all siblings to client; client merges | Application-specific merge logic | Client complexity, sibling explosion risk | Riak, Dynamo (original) |
| CRDTs | Use conflict-free data types that auto-merge | Automatic, mathematically correct resolution | Limited data type support, space overhead | Riak (flags, counters, sets, maps) |
| Application Callbacks | Server invokes app-defined merge function | Custom logic without exposing siblings | Merge function must be deterministic | CouchDB (map/reduce views) |
| Multi-Value Register | Store all concurrent values as a set | No data loss, explicit concurrency | Application must handle multi-value reads | Research systems |
Connection to CRDTs
CRDTs (Conflict-free Replicated Data Types) are the most elegant resolution strategy. They use mathematical properties (commutativity, associativity, idempotency) to guarantee that all replicas converge to the same state regardless of the order updates are applied:
# G-Counter CRDT: A grow-only counter using version vectors
class GCounter:
"""Each replica increments its own entry.
The counter value is the sum of all entries.
Merge = element-wise max → always converges."""
def __init__(self, replica_id):
self.rid = replica_id
self.counts = {}
def increment(self, amount=1):
self.counts[self.rid] = self.counts.get(self.rid, 0) + amount
def value(self):
return sum(self.counts.values())
def merge(self, other_counts):
all_keys = set(self.counts.keys()) | set(other_counts.keys())
for k in all_keys:
self.counts[k] = max(
self.counts.get(k, 0),
other_counts.get(k, 0)
)
# No conflicts possible! Every merge converges.
# Value only goes up, and max() is idempotent.
# Replica A increments 3 times: {A: 3} → value = 3
# Replica B increments 2 times: {B: 2} → value = 2
# After merge at A: {A: 3, B: 2} → value = 5
# After merge at B: {A: 3, B: 2} → value = 5 ✓ Same!
Advanced Topics & Optimizations
Interval Tree Clocks
Interval Tree Clocks (ITCs), introduced by Almeida et al. (2008), solve the dynamic membership problem. Unlike vector clocks, ITCs don't need a fixed process count — they use a binary tree structure where identity "stamps" can be forked (when a process joins) and joined (when a process leaves), automatically managing the size of causal metadata.
ITC operations:
fork(): Split an identity into two halves
(1) → (1,0) for existing + (0,1) for new node
join(): Merge two identities back
(1,0) + (0,1) → (1)
event(): Increment the clock (like vector clock tick)
Advantage: O(log N) space in practice (vs O(N) for VCs)
No need for global knowledge of all participants
Hybrid Logical Clocks (HLCs)
Hybrid Logical Clocks, proposed by Kulkarni et al. (2014), combine physical time and logical time. They track causality like Lamport timestamps while staying close to NTP-synchronized physical time. This is used in CockroachDB and Google Spanner (with TrueTime).
HLC structure: (physical_time, logical_counter)
Rules:
Send/Local event:
l' = max(l, physical_time())
if l' == l:
c = c + 1 # same physical time → increment logical
else:
c = 0 # physical time advanced → reset logical
l = l'
timestamp = (l, c)
Receive (msg has (l_m, c_m)):
l' = max(l, l_m, physical_time())
if l' == l == l_m:
c = max(c, c_m) + 1
elif l' == l:
c = c + 1
elif l' == l_m:
c = c_m + 1
else:
c = 0
l = l'
Advantage: Timestamps are close to real wall-clock time
(useful for snapshot reads, TTL, etc.)
while still guaranteeing causal ordering.
CockroachDB: Uses HLC for MVCC timestamps.
Read snapshots use HLC time, not NTP time.
Google Spanner: Uses GPS + atomic clocks (TrueTime)
with uncertainty intervals [earliest, latest]
to implement linearizable reads.
Bloom Clocks
A more recent optimization (Linde et al., 2022): use a Bloom filter instead of a vector to represent causal history. The space is fixed (independent of the number of processes), but comparisons become probabilistic — false positives are possible (reporting happens-before when events are actually concurrent), but false negatives are not.
Bloom Clock: Fixed-size bit array (e.g., 256 bits)
Each event hashes (process_id, counter) into the Bloom filter.
Comparison:
If A's bits are a subset of B's bits → probably A → B
If neither is a subset → definitely concurrent
Space: O(1) regardless of number of processes!
Trade-off: Small false-positive rate (~1-5% with good sizing)
Real-World Systems
| System | Mechanism | Details |
|---|---|---|
| Amazon Dynamo | Vector Clocks | Original 2007 paper. Clients resolve siblings. Found operationally complex. |
| Amazon DynamoDB | LWW (timestamps) | Simplified. Single-table design avoids most conflicts. |
| Riak | Dotted Version Vectors + CRDTs | Most sophisticated. DVVs avoid sibling explosion. Built-in CRDT data types. |
| Cassandra | LWW (timestamps per cell) | Each column cell has a timestamp. Latest timestamp wins per cell. Uses NTP. |
| CockroachDB | Hybrid Logical Clocks | MVCC with HLC timestamps. Serializable isolation with causal ordering. |
| Google Spanner | TrueTime (GPS + atomic clocks) | Bounded uncertainty interval. Waits out uncertainty for linearizability. |
| Voldemort | Vector Clocks | LinkedIn's Dynamo-inspired store. Client-side conflict resolution. |
| CouchDB | Revision Trees | Git-like revision history. Deterministic winner selection + application merge. |
Summary & Key Takeaways
- Lamport Timestamps: O(1) space. Captures causal ordering in one direction only (a→b ⟹ L(a)<L(b)). Cannot detect concurrency. Good for total ordering (mutex, total-order broadcast).
- Vector Clocks: O(N) space. Captures causal ordering exactly (a→b ⟺ VC(a)<VC(b)). Detects concurrency perfectly. The gold standard for causal tracking.
- Version Vectors: Vector clocks specialized for data items. Used in replicated databases for conflict detection.
- Dotted Version Vectors: Fix sibling explosion in version vectors. Used by Riak.
- Hybrid Logical Clocks: Combine physical + logical time. Stay close to wall-clock time while preserving causality. Used by CockroachDB.
In the next post, we explore Gossip Protocols — the mechanism by which nodes in a distributed system efficiently disseminate information (including vector clocks and version vectors) to each other without centralized coordination.