← All Posts
High Level Design Series · Distributed Systems · Part 2

Vector Clocks & Logical Time

The Problem with Physical Clocks

In a single-process system, ordering events is trivial: the CPU executes instructions sequentially, and the system clock provides a monotonically increasing timestamp. But in a distributed system — where processes run on separate machines with separate clocks — determining "which event happened first?" becomes one of the most fundamental and deceptively difficult problems in computer science.

Physical clocks in distributed systems are unreliable for ordering events for several reasons:

Machine A clock:  10:00:00.000
Machine B clock:  10:00:00.035   (35ms ahead)

Event e1 on Machine A at local time 10:00:00.020
Event e2 on Machine B at local time 10:00:00.010

Physical timestamp says: e2 (10:00:00.010) happened before e1 (10:00:00.020)
Reality (after correcting for 35ms skew):
  e2 real time ≈ 09:59:59.975
  e1 real time = 10:00:00.020
  e1 actually happened AFTER e2!

Physical clocks gave us the WRONG ordering.
Leslie Lamport (1978): "The concept of one event happening before another in a distributed system is a subtle one… We cannot in general use physical time to determine the ordering of events."

This insight led Lamport to develop the first formal framework for reasoning about event ordering without relying on physical clocks — using logical time.

The Happens-Before Relation

Before we discuss any clock mechanism, we need a formal definition of causality in distributed systems. Lamport introduced the happens-before relation (denoted →) as the fundamental ordering primitive.

Definition: Happens-Before (→)

The relation → on the set of events in a distributed system is the smallest relation satisfying three conditions:

  1. Process Order: If events a and b occur on the same process, and a comes before b in the process's execution, then ab.
  2. Message Causality: If a is the sending of a message by one process and b is the receipt of that same message by another process, then ab.
  3. Transitivity: If ab and bc, then ac.

Definition: Concurrent Events (∥)

Two distinct events a and b are concurrent (written ab) if and only if:

¬(a → b) ∧ ¬(b → a)

Neither event causally precedes the other. They are independent — there is no chain of messages linking them. In a concurrent world, both orderings (a before b, or b before a) are equally valid.

This relation defines a partial order on events. Not all events are comparable — concurrent events have no defined ordering, and that's a feature, not a bug. Forcing a total order on concurrent events loses information about the system's actual causal structure.

The key insight: causality, not time, is what matters. If event a could not have influenced event b (no causal chain between them), then the relative ordering of a and b is irrelevant to system correctness.

Lamport Timestamps

Lamport timestamps (also called Lamport clocks or scalar logical clocks) are the simplest mechanism for capturing causal ordering. Each process maintains a single integer counter.

Algorithm: Lamport Timestamp

Each process Pi maintains a counter Ci, initially 0.

  1. Internal Event: Before executing an internal event, Pi increments its counter: Ci = Ci + 1
  2. Send Event: Before sending a message m, Pi increments its counter: Ci = Ci + 1, then attaches Ci to m as the message timestamp tm.
  3. Receive Event: Upon receiving a message m with timestamp tm, Pi updates: Ci = max(Ci, tm) + 1
# Lamport Clock implementation
class LamportClock:
    def __init__(self):
        self.time = 0
    
    def tick(self):
        """Internal event or send event"""
        self.time += 1
        return self.time
    
    def receive(self, msg_timestamp):
        """Receive event — merge then tick"""
        self.time = max(self.time, msg_timestamp) + 1
        return self.time

# Example: Three processes
A = LamportClock()  # Process A
B = LamportClock()  # Process B
C = LamportClock()  # Process C

# A does local work
a1 = A.tick()        # A: time=1  (event a1)

# A sends message to B
a2 = A.tick()        # A: time=2  (send event a2)
b1 = B.receive(a2)   # B: time=3  (receive, max(0,2)+1=3)

# B does local work
b2 = B.tick()        # B: time=4  (event b2)

# B sends message to C
b3 = B.tick()        # B: time=5  (send event b3)
c1 = C.receive(b3)   # C: time=6  (receive, max(0,5)+1=6)

# C does local work independently
c2 = C.tick()        # C: time=7  (event c2)

# Meanwhile, A also does work independently
a3 = A.tick()        # A: time=3  (event a3 — concurrent with c2!)

Properties of Lamport Timestamps

Lamport timestamps satisfy the clock condition:

Clock Condition: If a → b, then C(a) < C(b)

Where C(e) is the Lamport timestamp of event e.

IMPORTANT: The converse is NOT true!
  C(a) < C(b) does NOT imply a → b

Example:
  Event a3 on A has timestamp 3
  Event c1 on C has timestamp 6
  C(a3) = 3 < 6 = C(c1)
  
  But a3 and c1 might be concurrent!
  (If a3 happened after a2, and there's no message
   chain from a3 to c1, they are concurrent despite
   different timestamps.)

This asymmetry is the fundamental limitation of Lamport timestamps. They provide a necessary but insufficient condition for causal ordering:

StatementTruth ValueExplanation
a → b ⟹ C(a) < C(b) ✓ Always true If a causally precedes b, a's timestamp is always smaller
C(a) < C(b) ⟹ a → b ✗ Not necessarily A smaller timestamp does NOT imply causation — events could be concurrent
C(a) = C(b) ⟹ a ∥ b ✗ Not necessarily Equal timestamps don't guarantee concurrency (different processes may coincidentally assign same value)
a ∥ b ⟹ ??? ✗ Cannot detect Lamport timestamps cannot detect concurrency at all
Why This Matters: If you're building a replicated key-value store and two clients independently write different values to the same key, Lamport timestamps cannot tell you that these writes are concurrent. You'll arbitrarily pick one as "later" based on timestamp, silently losing the other write. This is the exact problem vector clocks solve.

▶ Lamport Timestamp Animation

Watch how Lamport timestamps increment — and where they fail to detect concurrency.

Establishing Total Order with Lamport Timestamps

While Lamport timestamps only give a partial order, you can create a total order by using process IDs as a tiebreaker. This is useful for mutual exclusion and total-order broadcast:

Total Order: (C(a), P_a) < (C(b), P_b)
  iff C(a) < C(b), or
      C(a) = C(b) and P_a < P_b

This gives every event a unique, totally ordered position.
But it still doesn't capture concurrency —
it just arbitrarily breaks ties.

Example:
  Event e1 at (timestamp=5, process=A) → rank (5, A)
  Event e2 at (timestamp=5, process=B) → rank (5, B)
  
  Total order says e1 < e2 (because A < B),
  but e1 and e2 might be causally unrelated.

Vector Clocks

Vector clocks, introduced independently by Fidge (1988) and Mattern (1989), solve the fundamental limitation of Lamport timestamps. Instead of a single counter, each process maintains a vector of counters — one entry for every process in the system.

Definition: Vector Clock

For a system with N processes {P1, P2, ..., PN}, each process Pi maintains a vector VCi[1..N] where:

  • VCi[i] = the number of events that have occurred at Pi (own logical clock)
  • VCi[j] = the latest event count of Pj that Pi knows about (last known time at Pj)

Algorithm: Vector Clock Update Rules

  1. Internal Event at Pi: Increment own entry: VCi[i] = VCi[i] + 1
  2. Send Event at Pi: Increment own entry: VCi[i] = VCi[i] + 1. Attach entire vector VCi to the message.
  3. Receive Event at Pi (message from Pj with vector VCmsg):
    • Merge: For each k, VCi[k] = max(VCi[k], VCmsg[k])
    • Then increment own entry: VCi[i] = VCi[i] + 1
# Vector Clock implementation
class VectorClock:
    def __init__(self, process_id, num_processes):
        self.pid = process_id
        self.clock = [0] * num_processes
    
    def tick(self):
        """Internal event or before sending"""
        self.clock[self.pid] += 1
        return list(self.clock)    # return a copy
    
    def send(self):
        """Send event — tick and return vector to attach"""
        self.clock[self.pid] += 1
        return list(self.clock)
    
    def receive(self, msg_vector):
        """Receive event — merge then tick"""
        for i in range(len(self.clock)):
            self.clock[i] = max(self.clock[i], msg_vector[i])
        self.clock[self.pid] += 1
        return list(self.clock)
    
    def __repr__(self):
        return f"VC({self.clock})"

# Example: 3-process system
A = VectorClock(0, 3)  # Process A (index 0)
B = VectorClock(1, 3)  # Process B (index 1)
C = VectorClock(2, 3)  # Process C (index 2)

# A does local work
a1 = A.tick()          # A: [1, 0, 0]

# A sends message to B
a2 = A.send()          # A: [2, 0, 0]  — attached to message
b1 = B.receive(a2)     # B: [2, 1, 0]  — merge [0,0,0] with [2,0,0], then +1 at B

# B sends message to C
b2 = B.send()          # B: [2, 2, 0]  — attached to message
c1 = C.receive(b2)     # C: [2, 2, 1]  — merge [0,0,0] with [2,2,0], then +1 at C

# Now: A does work independently, C does work independently
a3 = A.tick()          # A: [3, 0, 0]
c2 = C.tick()          # C: [2, 2, 2]

# Can we compare a3 and c2?
# a3 = [3, 0, 0]    c2 = [2, 2, 2]
# a3[0]=3 > c2[0]=2  but  a3[1]=0 < c2[1]=2
# INCOMPARABLE → a3 ∥ c2  (concurrent!)

Comparing Vector Clocks

The power of vector clocks comes from their comparison rules, which precisely capture the happens-before relation:

Definition: Vector Clock Ordering

Given two vector timestamps V and W of dimension N:

  • V = W iff ∀k: V[k] = W[k] (identical vectors)
  • V ≤ W iff ∀k: V[k] ≤ W[k] (every component of V is ≤ corresponding component of W)
  • V < W iff V ≤ W ∧ V ≠ W (V is dominated by W — at least one component is strictly less)
  • V ∥ W (concurrent) iff ¬(V ≤ W) ∧ ¬(W ≤ V) (neither dominates the other)
# Comparison functions
def vc_leq(v, w):
    """v ≤ w: every component of v is ≤ w"""
    return all(v[i] <= w[i] for i in range(len(v)))

def vc_lt(v, w):
    """v < w: v ≤ w and at least one component strictly less"""
    return vc_leq(v, w) and v != w

def vc_concurrent(v, w):
    """v ∥ w: neither dominates the other"""
    return not vc_leq(v, w) and not vc_leq(w, v)

def relationship(v, w):
    if v == w:
        return "EQUAL"
    if vc_lt(v, w):
        return f"{v} → {w} (happens-before)"
    if vc_lt(w, v):
        return f"{w} → {v} (happens-before)"
    return f"{v} ∥ {w} (CONCURRENT)"

# Examples:
print(relationship([2, 0, 0], [2, 2, 1]))  # [2,0,0] → [2,2,1]
print(relationship([3, 0, 0], [2, 2, 2]))  # CONCURRENT!
print(relationship([1, 2, 0], [1, 1, 3]))  # CONCURRENT!
print(relationship([1, 2, 3], [1, 3, 3]))  # [1,2,3] → [1,3,3]

The Vector Clock Theorem

The crucial theorem that makes vector clocks strictly more powerful than Lamport timestamps:

Theorem (Fidge-Mattern)

For any two events a and b in a distributed computation:

a → b  ⟺  VC(a) < VC(b)

This is an if and only if — a biconditional. Compare this with Lamport timestamps, which only provide the forward implication. Vector clocks are a faithful representation of the happens-before partial order.

This means:

▶ Vector Clock Animation

Step through events across 3 nodes. Watch vectors update and see how concurrent events are detected.

Limitations of Vector Clocks

Vector clocks are theoretically elegant but have practical limitations:

Storage overhead comparison:
  Lamport Timestamp:  O(1) per event, O(1) per message
  Vector Clock:       O(N) per event, O(N) per message

  N = 3 nodes:     3 integers per message   → negligible
  N = 100 nodes:   100 integers per message  → ~800 bytes
  N = 10,000 nodes: 10,000 integers          → ~80 KB per message!
  
  This is why real systems use optimizations like
  version vectors, dotted version vectors, or
  interval tree clocks.

Version Vectors for Data Versioning

In practice, distributed databases don't track every event — they track data versions. A version vector is a specialization of vector clocks used specifically for tracking causal histories of data items.

While a vector clock tracks events at a per-process granularity, a version vector tracks updates to a specific data item. Each time a replica updates a data item, it increments its own entry in the version vector associated with that item.

# Version Vector for a replicated key-value store
# Each key has its own version vector

class VersionedStore:
    def __init__(self, replica_id, num_replicas):
        self.rid = replica_id
        self.data = {}   # key → (value, version_vector)
    
    def put(self, key, value):
        """Local write — increment own version"""
        if key in self.data:
            _, vv = self.data[key]
        else:
            vv = {}
        vv[self.rid] = vv.get(self.rid, 0) + 1
        self.data[key] = (value, dict(vv))
        return dict(vv)
    
    def receive_update(self, key, value, remote_vv):
        """Receive a replicated write from another replica"""
        if key not in self.data:
            # No local version — accept remote
            self.data[key] = (value, dict(remote_vv))
            return "APPLIED"
        
        _, local_vv = self.data[key]
        rel = compare_vv(local_vv, remote_vv)
        
        if rel == "LESS":
            # Remote is strictly newer — overwrite
            self.data[key] = (value, dict(remote_vv))
            return "APPLIED"
        elif rel == "GREATER":
            # Local is strictly newer — discard remote
            return "DISCARDED"
        elif rel == "CONCURRENT":
            # Both versions are valid! → CONFLICT
            return "CONFLICT"
        else:
            # Equal — idempotent duplicate
            return "DUPLICATE"

def compare_vv(a, b):
    """Compare two version vectors"""
    all_keys = set(a.keys()) | set(b.keys())
    a_leq_b = all(a.get(k, 0) <= b.get(k, 0) for k in all_keys)
    b_leq_a = all(b.get(k, 0) <= a.get(k, 0) for k in all_keys)
    
    if a_leq_b and b_leq_a:
        return "EQUAL"
    elif a_leq_b:
        return "LESS"
    elif b_leq_a:
        return "GREATER"
    else:
        return "CONCURRENT"
Version Vector vs Vector Clock: A common confusion is conflating these. A vector clock tracks the causal history of processes/events. A version vector tracks the causal history of a particular data item. In a key-value store, each key might have its own version vector, but the system doesn't need a global vector clock for all events.

Conflict Detection Example

Scenario: Shopping cart in a replicated store (3 replicas: R1, R2, R3)

Initial state at all replicas:
  cart = {items: ["milk"]}
  version: {R1:1}      ← R1 originally wrote this

Client A writes to R1:
  cart = {items: ["milk", "eggs"]}
  version: {R1:2}      ← R1 increments its entry

Client B writes to R2 (hasn't seen A's write yet):
  cart = {items: ["milk", "bread"]}
  version: {R1:1, R2:1} ← R2 increments its entry, keeps R1:1

Now R1 and R2 try to sync:
  R1 version: {R1:2}
  R2 version: {R1:1, R2:1}
  
  R1:2 vs R1:1 → R1 has higher R1 count
  R2:1 vs R2:0 → R2 has higher R2 count
  
  → CONCURRENT! Neither dominates.
  → CONFLICT DETECTED!
  
  Resolution options:
  1. Return both versions to client (Amazon's shopping cart approach)
  2. Merge automatically: {items: ["milk", "eggs", "bread"]}
  3. Last-write-wins (lose one update — dangerous!)
  4. Application-specific conflict resolution

Dotted Version Vectors

Standard version vectors have a subtle problem called sibling explosion. When a client reads a conflicting value (with multiple siblings) and then writes back a resolved version, the version vector may not correctly subsume the old siblings, causing them to re-appear. This is known as the false concurrency problem.

Riak (a distributed key-value store) solved this with dotted version vectors (DVVs), introduced by Preguiça et al. (2012).

Problem with plain version vectors:

Client reads key "cart" from R1, sees conflict:
  Sibling 1: {items: ["milk", "eggs"]},  vv: {R1:2}
  Sibling 2: {items: ["milk", "bread"]}, vv: {R1:1, R2:1}

Client resolves and writes back to R1:
  Resolved: {items: ["milk", "eggs", "bread"]}
  
With plain version vector, R1 increments:
  New vv: {R1:3}   ← But this doesn't dominate {R1:1, R2:1}!
  
  Compare {R1:3} vs {R1:1, R2:1}:
  R1: 3 > 1  ✓
  R2: 0 < 1  ✗  → STILL CONCURRENT!
  
  Sibling 2 comes back as a zombie! 💀

Solution: Dotted Version Vector
  A DVV = (version_vector, dot)
  The "dot" records the exact (replica, counter) pair of the event
  that created this version.
  
  When resolving, the DVV tracks which specific versions were
  seen by the client, allowing the system to correctly subsume them.

Definition: Dotted Version Vector

A dotted version vector is a pair (V, d) where:

  • V is a version vector (the "causal context" — what the client had seen)
  • d = (r, n) is a "dot" — a single (replica, counter) pair identifying the exact event that produced this value

When comparing, the dot allows the system to determine that a new write descends from (subsumes) specific siblings, even if the version vector alone doesn't dominate them.

# Riak's approach with DVVs (simplified)

DVV structure: { context: {R1:2, R2:1}, dot: (R1, 3) }

Reading:  Client gets ALL siblings + their DVVs
          Client's context = merge of all sibling version vectors

Writing:  Client sends back:
          - New value
          - Context (merged VV from what client read)
          
          Server creates new DVV:
          - context = client's context
          - dot = (coordinating_replica, next_counter)
          
          Any sibling whose dot is dominated by the
          context is now correctly pruned.

This eliminates the sibling explosion problem.

DynamoDB & Last-Write-Wins

Amazon's original Dynamo paper (2007) used vector clocks for conflict detection, famously implementing the "shopping cart" example where concurrent writes are returned to the client as siblings for manual resolution. However, this approach had significant operational complexity.

DynamoDB (the managed service, launched 2012) took a radically different approach: last-write-wins (LWW) using physical timestamps.

DynamoDB's LWW approach:

Write A at time T1: {key: "user:1", value: "Alice", ts: 1000}
Write B at time T2: {key: "user:1", value: "Bob",   ts: 1003}

If T2 > T1, Write B wins. Write A is silently discarded.

No conflicts. No siblings. No client-side resolution.
Simple. Fast. But potentially lossy.

LWW Trade-offs

AspectVector Clocks (Dynamo-style)LWW (DynamoDB-style)
Conflict Detection Detects all concurrent writes accurately Cannot detect concurrent writes — one is silently dropped
Data Loss No data loss — all concurrent versions preserved Potential data loss on every concurrent write
Client Complexity Clients must handle conflict resolution logic No conflict resolution needed — simple read/write API
Metadata Overhead O(N) vector per data item (N = number of writers) Single timestamp per item — negligible overhead
Read Latency May return multiple versions; client must reconcile Always returns single value; fast reads
Correctness Preserves causal consistency guarantees Relies on physical clock accuracy (NTP)
Use Case Fit Shopping carts, collaborative editing, CRDTs Session stores, caches, non-critical metadata, counters
Why DynamoDB chose LWW: In practice, Amazon found that most application developers never implemented proper conflict resolution, leading to subtle bugs. LWW's simplicity means predictable behavior — even if that behavior is "last write wins and other writes are lost." For the vast majority of use cases (single-writer patterns, idempotent writes), LWW works correctly and is dramatically simpler to operate.

When LWW is Dangerous

Scenario: Bank account balance with LWW

Initial balance: $1000

Client A (at T=100): Deposits $200
  Reads balance: $1000
  Writes: $1200 with timestamp 100

Client B (at T=101): Withdraws $300
  Reads balance: $1000 (hasn't seen A's write)
  Writes: $700 with timestamp 101

LWW result: B wins (timestamp 101 > 100)
Final balance: $700

Expected balance: $1000 + $200 - $300 = $900
LOST: $200 deposit from Client A!

This is why financial systems NEVER use LWW.
They use serializable transactions or CRDTs.

Conflict Resolution Strategies

When vector clocks or version vectors detect concurrent writes, the system must resolve the conflict. There are several strategies, each with different trade-offs:

Strategy Comparison

StrategyHow It WorksProsConsUsed By
Last-Write-Wins Highest timestamp wins; discard others Simple, no client logic Data loss on concurrent writes DynamoDB, Cassandra (default)
Client Resolution Return all siblings to client; client merges Application-specific merge logic Client complexity, sibling explosion risk Riak, Dynamo (original)
CRDTs Use conflict-free data types that auto-merge Automatic, mathematically correct resolution Limited data type support, space overhead Riak (flags, counters, sets, maps)
Application Callbacks Server invokes app-defined merge function Custom logic without exposing siblings Merge function must be deterministic CouchDB (map/reduce views)
Multi-Value Register Store all concurrent values as a set No data loss, explicit concurrency Application must handle multi-value reads Research systems

Connection to CRDTs

CRDTs (Conflict-free Replicated Data Types) are the most elegant resolution strategy. They use mathematical properties (commutativity, associativity, idempotency) to guarantee that all replicas converge to the same state regardless of the order updates are applied:

# G-Counter CRDT: A grow-only counter using version vectors
class GCounter:
    """Each replica increments its own entry.
       The counter value is the sum of all entries.
       Merge = element-wise max → always converges."""
    
    def __init__(self, replica_id):
        self.rid = replica_id
        self.counts = {}
    
    def increment(self, amount=1):
        self.counts[self.rid] = self.counts.get(self.rid, 0) + amount
    
    def value(self):
        return sum(self.counts.values())
    
    def merge(self, other_counts):
        all_keys = set(self.counts.keys()) | set(other_counts.keys())
        for k in all_keys:
            self.counts[k] = max(
                self.counts.get(k, 0),
                other_counts.get(k, 0)
            )
    
    # No conflicts possible! Every merge converges.
    # Value only goes up, and max() is idempotent.

# Replica A increments 3 times:  {A: 3}        → value = 3
# Replica B increments 2 times:  {B: 2}        → value = 2
# After merge at A:              {A: 3, B: 2}  → value = 5
# After merge at B:              {A: 3, B: 2}  → value = 5  ✓ Same!

Advanced Topics & Optimizations

Interval Tree Clocks

Interval Tree Clocks (ITCs), introduced by Almeida et al. (2008), solve the dynamic membership problem. Unlike vector clocks, ITCs don't need a fixed process count — they use a binary tree structure where identity "stamps" can be forked (when a process joins) and joined (when a process leaves), automatically managing the size of causal metadata.

ITC operations:
  fork():  Split an identity into two halves
           (1) → (1,0) for existing + (0,1) for new node
  join():  Merge two identities back
           (1,0) + (0,1) → (1)
  event(): Increment the clock (like vector clock tick)
  
Advantage: O(log N) space in practice (vs O(N) for VCs)
           No need for global knowledge of all participants

Hybrid Logical Clocks (HLCs)

Hybrid Logical Clocks, proposed by Kulkarni et al. (2014), combine physical time and logical time. They track causality like Lamport timestamps while staying close to NTP-synchronized physical time. This is used in CockroachDB and Google Spanner (with TrueTime).

HLC structure: (physical_time, logical_counter)

Rules:
  Send/Local event:
    l' = max(l, physical_time())
    if l' == l:
      c = c + 1       # same physical time → increment logical
    else:
      c = 0           # physical time advanced → reset logical
    l = l'
    timestamp = (l, c)
  
  Receive (msg has (l_m, c_m)):
    l' = max(l, l_m, physical_time())
    if l' == l == l_m:
      c = max(c, c_m) + 1
    elif l' == l:
      c = c + 1
    elif l' == l_m:
      c = c_m + 1
    else:
      c = 0
    l = l'

Advantage: Timestamps are close to real wall-clock time
           (useful for snapshot reads, TTL, etc.)
           while still guaranteeing causal ordering.

CockroachDB: Uses HLC for MVCC timestamps.
             Read snapshots use HLC time, not NTP time.
Google Spanner: Uses GPS + atomic clocks (TrueTime)
                with uncertainty intervals [earliest, latest]
                to implement linearizable reads.

Bloom Clocks

A more recent optimization (Linde et al., 2022): use a Bloom filter instead of a vector to represent causal history. The space is fixed (independent of the number of processes), but comparisons become probabilistic — false positives are possible (reporting happens-before when events are actually concurrent), but false negatives are not.

Bloom Clock: Fixed-size bit array (e.g., 256 bits)
  
  Each event hashes (process_id, counter) into the Bloom filter.
  
  Comparison:
    If A's bits are a subset of B's bits → probably A → B
    If neither is a subset → definitely concurrent
    
  Space: O(1) regardless of number of processes!
  Trade-off: Small false-positive rate (~1-5% with good sizing)

Real-World Systems

SystemMechanismDetails
Amazon Dynamo Vector Clocks Original 2007 paper. Clients resolve siblings. Found operationally complex.
Amazon DynamoDB LWW (timestamps) Simplified. Single-table design avoids most conflicts.
Riak Dotted Version Vectors + CRDTs Most sophisticated. DVVs avoid sibling explosion. Built-in CRDT data types.
Cassandra LWW (timestamps per cell) Each column cell has a timestamp. Latest timestamp wins per cell. Uses NTP.
CockroachDB Hybrid Logical Clocks MVCC with HLC timestamps. Serializable isolation with causal ordering.
Google Spanner TrueTime (GPS + atomic clocks) Bounded uncertainty interval. Waits out uncertainty for linearizability.
Voldemort Vector Clocks LinkedIn's Dynamo-inspired store. Client-side conflict resolution.
CouchDB Revision Trees Git-like revision history. Deterministic winner selection + application merge.

Summary & Key Takeaways

The Core Hierarchy of Logical Time Mechanisms:
  1. Lamport Timestamps: O(1) space. Captures causal ordering in one direction only (a→b ⟹ L(a)<L(b)). Cannot detect concurrency. Good for total ordering (mutex, total-order broadcast).
  2. Vector Clocks: O(N) space. Captures causal ordering exactly (a→b ⟺ VC(a)<VC(b)). Detects concurrency perfectly. The gold standard for causal tracking.
  3. Version Vectors: Vector clocks specialized for data items. Used in replicated databases for conflict detection.
  4. Dotted Version Vectors: Fix sibling explosion in version vectors. Used by Riak.
  5. Hybrid Logical Clocks: Combine physical + logical time. Stay close to wall-clock time while preserving causality. Used by CockroachDB.
Interview Insight: When asked about event ordering in distributed systems, always start by explaining why physical clocks fail (NTP drift, leap seconds, clock skew). Then present the hierarchy: Lamport (partial, one-way) → Vector Clocks (perfect, two-way) → practical systems (LWW for simplicity, DVVs for correctness, HLCs for databases). Know the Dynamo → DynamoDB evolution story — it shows the industry's shift from theoretical correctness to practical simplicity.

In the next post, we explore Gossip Protocols — the mechanism by which nodes in a distributed system efficiently disseminate information (including vector clocks and version vectors) to each other without centralized coordination.