← All Posts
High Level Design Series · Foundations · Part 6

CAP Theorem & PACELC

If you have ever designed or evaluated a distributed data store, you have encountered a fundamental tension: you cannot have everything. The CAP theorem formalizes that tension, and the PACELC extension adds the nuance that real-world engineers need. This post will give you the rigorous background, the intuition, and the interview-ready vocabulary to reason about consistency and availability trade-offs in any distributed system.

The CAP Theorem

In the year 2000, Eric Brewer presented a conjecture at the ACM Symposium on Principles of Distributed Computing (PODC): it is impossible for a distributed data store to simultaneously provide more than two out of three guaranteesConsistency, Availability, and Partition Tolerance. Two years later, Seth Gilbert and Nancy Lynch of MIT published a formal proof, elevating the conjecture to a theorem.

The Three Guarantees

Consistency (C) — Every read receives the most recent write or an error. All nodes see the same data at the same time. This is linearizability (not to be confused with the "C" in ACID, which refers to database integrity constraints). Formally: for any execution, there exists a total order of operations such that each operation appears to take effect atomically at a single point in time between its invocation and response, and every read of a key returns the value written by the most recent preceding write to that key.

Availability (A) — Every request received by a non-failing node in the system must result in a response. There is no timeout or error allowed — the node must eventually return a meaningful result. Note the subtlety: availability is defined per non-failing node, not per client. A client whose network cable is cut is not relevant; a healthy server that refuses to answer violates availability.

Partition Tolerance (P) — The system continues to operate despite an arbitrary number of messages being dropped or delayed by the network between nodes. A network partition divides the cluster into two (or more) groups of nodes that cannot communicate with each other.

Key insight: In any real distributed system running over a network, partitions will happen — switches fail, cables get cut, cloud regions lose connectivity. Therefore P is not optional. The practical question is always: when a partition occurs, do you sacrifice C or A?

Proof Intuition

Consider the simplest possible distributed system: two nodes, N1 and N2, each holding a copy of variable X (initially X = v0). A network partition separates them — no messages can get through.

  1. A client sends a write to N1: X = v1.
  2. N1 cannot replicate to N2 (partition).
  3. Another client sends a read to N2.

N2 must now choose:

There is no third option. This is the crux of the Gilbert-Lynch proof. It holds regardless of the number of nodes, the replication protocol, or the consensus algorithm. The fundamental impossibility stems from the speed of light: information cannot propagate instantaneously, and partitions can prevent it from propagating at all.

Formal Statement

Let us state the theorem more precisely. Consider an asynchronous network model (no bound on message delivery time). It is impossible to implement a read/write data object that guarantees all three of the following properties:

In the partially synchronous model (bounded message delay), you can achieve all three when there is no partition, but you must sacrifice either C or A during a partition.

▶ CAP Venn Diagram

Click each region to explore the trade-offs. Use Step to walk through each combination.

CP Systems

CP systems choose Consistency over Availability during a network partition. When a partition occurs, nodes that cannot confirm they have the latest data will refuse to serve requests rather than return stale results.

Behavior During a Partition

Imagine a 5-node cluster using Raft consensus. The leader is in partition A (with 2 followers), and 2 followers are isolated in partition B:

Examples of CP Systems

SystemConsistency MechanismPartition Behavior
ZooKeeperZAB protocol (majority quorum)Minority partition becomes read-only, then unavailable
etcdRaft consensusMinority partition cannot serve reads or writes
HBaseSingle RegionServer per region + ZKRegion unavailable if its server is partitioned
MongoDBMajority read/write concernMinority partition rejects writes; reads depend on config
Redis (Sentinel)Single master per shardOld master in minority stops accepting writes after timeout
Google SpannerPaxos + TrueTimeMinority partition unavailable; majority continues

Use Cases for CP

AP Systems

AP systems choose Availability over Consistency during a network partition. Every node continues to accept reads and writes, even if it cannot communicate with other nodes. The trade-off: clients may see stale or divergent data.

Behavior During a Partition

Consider a 3-node Cassandra cluster with replication factor 3 and CONSISTENCY ONE:

Examples of AP Systems

SystemConflict ResolutionConsistency Model
CassandraLast-write-wins (LWW) timestampsEventual; tunable per query
DynamoDBLWW or application-levelEventually consistent (default) or strong reads
CouchDBMVCC + revision tree, manual conflict resolutionEventual consistency
DNSTTL-based expiryEventual (propagation delay)
RiakVector clocks / CRDTsEventual consistency
Amazon S3Internal reconciliationRead-after-write for PUTs; eventual for DELETEs

Use Cases for AP

▶ Partition Scenario: CP vs AP

Step through to see how CP and AP systems behave during a network partition.

Why Not CA?

The CA combination — Consistency + Availability without Partition Tolerance — is not achievable in a real distributed system. Here is why:

Network Partitions Are Inevitable

In any system that spans more than one machine connected by a network, partitions will happen. This is not a theoretical possibility — it is an operational certainty:

The Single-Node "Loophole"

A traditional single-node RDBMS (PostgreSQL on one server) appears to be CA: it provides strong consistency and is always available (as long as the single node is up). But this is a trivial case — there is no network between nodes, so the question of partition tolerance does not arise. The moment you replicate that PostgreSQL database to a second server, you re-enter CAP territory and must choose between CP and AP.

CA is a degenerate case: You can have CA if and only if you have a single node (or a perfectly reliable network, which does not exist). In practice, "CA" means "not distributed." Every real distributed system is either CP or AP (or somewhere on the spectrum between them).

Common Misunderstanding

Many people misinterpret CAP as "pick any 2 of 3," as if all three combinations are equally valid. This is misleading. The correct mental model is:

  1. Accept that P is mandatory (partitions happen).
  2. When a partition occurs, choose between C and A.
  3. When there is no partition, you can (and should) have both C and A.

This nuance is exactly what the PACELC theorem captures.

The PACELC Extension

The PACELC theorem, introduced by Daniel Abadi in 2012, extends CAP by acknowledging that even when there is no partition, there is a trade-off between Latency and Consistency.

The Statement

If there is a Partition, choose between Availability and Consistency. Else (no partition), choose between Latency and Consistency.

This gives four possible combinations:

Partition → A or CElse → L or CNotationExamples
Availability (PA)Latency (EL)PA/ELCassandra, DynamoDB, Riak, CouchDB
Availability (PA)Consistency (EC)PA/ECRare — would mean "sloppy during partition, strict otherwise"
Consistency (PC)Latency (EL)PC/ELYahoo! PNUTS (historical), Cosmos DB (session consistency)
Consistency (PC)Consistency (EC)PC/ECHBase, MongoDB (majority), Google Spanner, ZooKeeper, etcd

Why Latency Matters

Even without a partition, enforcing strong consistency requires coordination:

An AP/EL system like Cassandra with CONSISTENCY ONE acknowledges writes as soon as one local node persists them — extremely fast, but at the cost of consistency. An AP/EL system can achieve single-digit millisecond writes even across data centers.

Practical Implications

Most modern databases let you tune the trade-off per operation:

▶ PACELC Matrix

Step through each quadrant to see database examples and trade-offs.

Consistency Models Deep Dive

CAP defines consistency as linearizability, but real systems implement a spectrum of consistency models. Understanding this spectrum is crucial for system design interviews and for making informed database choices.

Strong Consistency (Linearizability)

The gold standard. Every operation appears to take effect atomically at a single point in time. Once a write completes, all subsequent reads (from any node) return that value. Equivalent to having a single copy of the data.

Client A
write(X=1) ──────── ack ✓
Client B
─────── read(X) → 1 ✓ (always sees the write)

Systems: Google Spanner, etcd, ZooKeeper, CockroachDB, single-node RDBMS.

Cost: High latency (coordination required). Low throughput under contention.

Sequential Consistency

All operations appear to execute in some sequential order that is consistent with the program order of each individual client. However, this order need not correspond to real-time ordering. Two clients may disagree about the order of operations from different clients, as long as each client's own operations appear in order.

Difference from linearizability: Linearizability requires the total order to respect real-time ordering. Sequential consistency only requires it to respect per-client ordering.

Systems: ZooKeeper (for reads — it guarantees sequential consistency, not linearizability, for reads by default).

Causal Consistency

Operations that are causally related must be seen by all nodes in the same order. Concurrent operations (with no causal relationship) may be seen in different orders by different nodes.

Causality: if operation A could have influenced operation B (e.g., A wrote a value that B read before writing), then A must be ordered before B everywhere.

Systems: MongoDB (with causal consistency sessions), some CRDTs, COPS (research system).

Benefit: Much lower latency than linearizability while preserving intuitive ordering.

Eventual Consistency

If no new updates are made to a data item, eventually all accesses will return the last updated value. There is no bound on how long "eventually" takes, and in the interim, different nodes may return different values.

Systems: Cassandra (default), DynamoDB (default reads), DNS, Amazon S3 (for deletes).

Variants:

Read-Your-Writes Consistency

A client will always see its own writes. After a client writes a value, subsequent reads by that same client will return the written value (or a newer one). Other clients may still see stale data.

Systems: DynamoDB (with consistent reads), most web applications (session stickiness).

Implementation: Often achieved by routing reads to the same node that accepted the write, or by including a write timestamp in the read request.

Monotonic Reads

If a client reads a value for a key, any subsequent read of the same key by that client will return the same value or a newer one. Time never "goes backward" for a single client.

Without monotonic reads: Client reads X=5 from Node A, then reads X=3 from Node B (which hasn't received the update yet). Confusing!

With monotonic reads: The system ensures the second read returns at least X=5.

The Consistency Spectrum

From strongest to weakest:

Linearizability
    ↓
Sequential Consistency
    ↓
Causal Consistency
    ↓
Read-Your-Writes + Monotonic Reads
    ↓
Eventual Consistency

Each step down relaxes guarantees but improves latency, throughput, and availability. The art of system design is choosing the weakest model that your application can tolerate.

Real-World CAP Classification

The following table classifies popular databases by their CAP and PACELC behavior. Note that many databases are tunable — they can behave as CP or AP depending on configuration.

DatabaseCAPPACELCDefault BehaviorNotes
PostgreSQL (single node) CA* N/A (single node) Strong consistency, always available Not distributed; CA is trivial
PostgreSQL (streaming replication) CP PC/EC Sync replication: consistent but follower may be unavailable Async replication behaves more like AP
MySQL (Group Replication) CP PC/EC Majority-based consensus; minority partitions become read-only Single-primary mode is CP; multi-primary can have conflicts
MongoDB CP PC/EC (with majority concern) Majority write concern ensures durability; minority unavailable Tunable: lower write concern trades consistency for availability
Cassandra AP PA/EL Always writable; LWW conflict resolution; eventual consistency With QUORUM reads/writes, behaves more like CP
DynamoDB AP PA/EL Eventually consistent reads by default; always available Strongly consistent reads available (higher latency)
Redis (Cluster) CP PC/EL Single master per slot; async replication can lose recent writes During partition: minority partition stops serving the slot
CouchDB AP PA/EL Multi-master; accepts writes anywhere; conflict detection on sync Manual conflict resolution via revision trees
HBase CP PC/EC One RegionServer per region; strong consistency within regions Region unavailable during RegionServer failure until reassignment
etcd CP PC/EC Raft consensus; linearizable reads and writes Minority partition completely unavailable
Consul CP PC/EC Raft consensus for KV store; strong consistency by default Stale reads available for AP-like behavior
ZooKeeper CP PC/EC ZAB protocol; writes are linearizable; reads are sequentially consistent Sync command available for linearizable reads
Google Spanner CP PC/EC External consistency via TrueTime; Paxos-based replication Highest consistency guarantee of any globally distributed DB
CockroachDB CP PC/EC Serializable isolation; Raft-based replication Spanner-inspired; uses hybrid logical clocks
Remember: CAP classification is a simplification. Most modern databases are tunable — you can configure them to behave anywhere on the CP ↔ AP spectrum. The classification above reflects the default or most common configuration.

CAP in System Design Interviews

When to Bring Up CAP

CAP is most relevant when you are:

Interview tip: Don't just say "I'll use Cassandra because it's AP." Explain why AP is the right choice for this use case, what the implications are (eventual consistency, conflict resolution), and how you'll handle the edge cases. Show you understand the trade-off, not just the label.

Framework for Discussing CAP

  1. Identify the consistency requirement: "Does this feature need strong consistency, or can we tolerate stale data?"
  2. Assess the availability requirement: "What happens if this service returns an error for 30 seconds? Is that acceptable?"
  3. Choose the trade-off: "For this use case, I'll prefer availability and accept eventual consistency because..."
  4. Discuss PACELC: "Even without partitions, I need low latency for reads, so I'll choose PA/EL — Cassandra with CONSISTENCY ONE."
  5. Mention mitigation strategies: How you handle the downside of your choice (conflict resolution, retries, compensating transactions).

Example: Design a Chat System

Consider a WhatsApp-like messaging system. Different components have different CAP requirements:

ComponentCAP ChoiceReasoning
Message delivery AP Users should always be able to send messages. Duplicate delivery is better than message loss. Eventually consistent ordering is acceptable (messages may arrive out of order briefly).
Group membership CP Adding/removing members must be consistent. Two admins simultaneously removing each other creates paradoxes. Use a CP store (e.g., ZooKeeper or Raft-based service).
Read receipts AP Showing a read receipt 2 seconds late is fine. Showing "unread" when the message was read is annoying but not catastrophic.
User profiles AP Profile changes can propagate eventually. Seeing an old profile photo for a few seconds is acceptable.
End-to-end encryption keys CP Key exchange must be strongly consistent. Using a stale public key means the message is unreadable.

Common Interview Questions

Anti-Patterns to Avoid

Pro tip for interviews: After explaining your CAP choice, always follow up with: "However, we can mitigate the downside by..." This shows maturity. For AP systems, mention conflict resolution, compensating transactions, and client-side caching. For CP systems, mention graceful degradation, circuit breakers, and stale reads from a cache.

The CAP theorem is a starting point, not an endpoint. Real-world system design is about understanding the nuances — tunable consistency, partial failures, and the fact that most of the time, your system is operating normally (no partition), where the latency-consistency trade-off from PACELC is what really matters. In the next post, we will explore ACID vs BASE, two philosophies for data consistency that emerge directly from the CAP trade-offs we discussed here.