← All Posts
High Level Design Series · Distributed Systems · Part 5

Saga Pattern

Microservices are great for team autonomy and independent deployment, but they introduce a hard problem: how do you manage transactions that span multiple services? A monolithic application can wrap everything in a single ACID transaction, but in a distributed system where each service owns its own database, there is no global transaction coordinator. The Two-Phase Commit (2PC) protocol solves this theoretically, but it is blocking, fragile, and doesn't scale well across autonomous services. Enter the Saga pattern — a sequence of local transactions coordinated by events or a central orchestrator, with compensating transactions that undo work if something fails partway through.

Key insight: A Saga trades atomicity for availability. Instead of one big ACID transaction, you get a series of small ones, each locally atomic, connected by a coordination mechanism. If any step fails, previously completed steps are compensated (logically reversed) rather than rolled back.

The Distributed Transaction Problem

Consider an e-commerce order flow. When a customer clicks "Place Order," several things must happen:

  1. Order Service creates the order record.
  2. Payment Service charges the customer's credit card.
  3. Inventory Service reserves the items from stock.
  4. Shipping Service schedules the delivery.

In a monolith, you would wrap all four operations in a single database transaction. If the shipping step fails, the entire transaction rolls back — the payment is never charged, the inventory is never reserved, the order is never created. Simple and safe.

But in a microservices architecture, each service has its own database. There is no shared transaction log. You cannot do a BEGIN TRANSACTION that spans four different databases owned by four different services. The fundamental reasons are:

Why Not Just Use 2PC?

Two-Phase Commit (covered in the previous post) guarantees atomicity across distributed participants, but it has critical drawbacks in a microservices context:

2PC DrawbackImpact on Microservices
Blocking protocolAll participants hold locks until the coordinator commits. A slow Payment Service blocks the Inventory Service and Shipping Service.
Single point of failureIf the coordinator crashes between prepare and commit, all participants are stuck in an uncertain state, holding locks indefinitely.
Tight couplingEvery participant must implement the 2PC protocol. Adding a new service to the transaction requires coordinating with the transaction manager.
LatencyTwo round-trips (prepare + commit) across all participants. Latency = 2 × max(participant response times).
Not supported across heterogeneous systemsIf Order uses PostgreSQL and Payment uses Stripe's API, there is no XA-compatible interface for Stripe.

The Saga pattern was introduced by Hector Garcia-Molina and Kenneth Salem in their 1987 paper as an alternative to long-lived transactions. The core idea: break a long-lived transaction into a sequence of shorter transactions, each of which can be compensated if a later step fails.

Saga Pattern: Core Concepts

Definition

A Saga is a sequence of local transactions T1, T2, ..., Tn where:

T1: Create Order T2: Charge Payment T3: Reserve Stock T4: Schedule Shipping ✓ Complete

If T3 (Reserve Stock) fails:

T1 ✓ T2 ✓ T3 ✗ FAIL C2: Refund Payment C1: Cancel Order

Compensating Transactions

A compensating transaction is not the same as a database rollback. It is a new, forward-moving transaction that semantically undoes the effect of a prior transaction. This is a crucial distinction:

Original Transaction (Ti)Compensating Transaction (Ci)Why Not a Simple Rollback?
Create order (status: PENDING)Update order status to CANCELLEDOrder may have been visible to user; audit trail needed
Charge credit card $99.99Issue refund of $99.99Payment already left the system; can't "un-charge"
Reserve 2 units of SKU-1234Release 2 units of SKU-1234Other orders may have seen the updated stock count
Send shipping notification emailSend cancellation emailEmail is already sent; can't unsend
Important: Compensating transactions must be idempotent — if the compensation is retried (due to network failure), it should produce the same result. This typically means using idempotency keys, checking current state before acting, and designing for at-least-once delivery.

Types of Saga Transactions

Not all steps in a saga are created equal. Garcia-Molina and Salem classified saga steps into three categories:

TypeDefinitionExample
Compensatable Can be undone by a compensating transaction. These are the normal saga steps. Reserve inventory (can release), charge payment (can refund)
Pivot The point of no return. If the pivot succeeds, the saga will run to completion. If it fails, compensation begins. Charge credit card (if this succeeds, we commit to the order)
Retriable Steps after the pivot that are guaranteed to eventually succeed (with retries). They never need compensation because they always complete. Send confirmation email (can always be retried), update analytics

The ordering is always: [Compensatable*] → Pivot → [Retriable*]. You design the saga so that once the pivot transaction succeeds, all remaining steps are retriable — they don't need compensations because they will eventually succeed.

Choreography-Based Saga

In a choreography-based saga, there is no central coordinator. Each service listens for events, performs its local transaction, and publishes a new event. The saga emerges from the interaction of independently reacting services — like dancers in a ballet who each know their part without a choreographer directing them in real time.

How It Works

  1. Order Service creates an order (status: PENDING) and publishes OrderCreated.
  2. Payment Service listens for OrderCreated, charges the card, publishes PaymentCompleted.
  3. Inventory Service listens for PaymentCompleted, reserves stock, publishes StockReserved.
  4. Shipping Service listens for StockReserved, schedules delivery, publishes ShipmentScheduled.
  5. Order Service listens for ShipmentScheduled, updates order to CONFIRMED.

Failure & Compensation

If the Shipping Service fails (e.g., address is unreachable):

  1. Shipping Service publishes ShippingFailed.
  2. Inventory Service listens for ShippingFailed, releases reserved stock, publishes StockReleased.
  3. Payment Service listens for StockReleased, refunds the charge, publishes PaymentRefunded.
  4. Order Service listens for PaymentRefunded, updates order to CANCELLED.

▶ Saga Choreography

Step through the happy path, then watch the failure cascade with compensating transactions in reverse.

Code Example: Choreography with Events

// Payment Service — listens for OrderCreated
async function onOrderCreated(event) {
  const { orderId, customerId, amount } = event.payload;

  try {
    // Idempotency: check if payment already processed for this order
    const existing = await db.payments.findOne({ orderId });
    if (existing) return;  // already processed — idempotent

    const charge = await stripe.charges.create({
      amount,
      currency: 'usd',
      customer: customerId,
      idempotency_key: `order-${orderId}`,
    });

    await db.payments.insert({
      orderId, chargeId: charge.id, amount, status: 'COMPLETED'
    });

    await eventBus.publish('PaymentCompleted', {
      orderId, chargeId: charge.id, amount
    });
  } catch (err) {
    await eventBus.publish('PaymentFailed', {
      orderId, reason: err.message
    });
  }
}

// Compensating handler — listens for StockReleased (during rollback)
async function onStockReleased(event) {
  const { orderId } = event.payload;
  const payment = await db.payments.findOne({ orderId });
  if (!payment || payment.status === 'REFUNDED') return;

  await stripe.refunds.create({ charge: payment.chargeId });
  await db.payments.update(
    { orderId },
    { $set: { status: 'REFUNDED' } }
  );
  await eventBus.publish('PaymentRefunded', { orderId });
}

Choreography: Pros & Cons

ProsCons
Simple — no central coordinator to build/maintainHard to understand the full saga flow (spread across services)
Loosely coupled — services only know about eventsCyclic dependencies possible if services listen to each other
Easy to add new services that react to eventsNo single place to see saga status — debugging is difficult
No single point of failure (no coordinator)Risk of "event spaghetti" as sagas grow in complexity
Good for simple, linear sagas (3-4 steps)Difficult to implement complex business rules and branching

Orchestration-Based Saga

In an orchestration-based saga, a central Saga Orchestrator (sometimes called the Saga Execution Coordinator or SEC) directs the saga. It tells each participant what to do, waits for the response, and decides the next step. Think of it as a conductor directing an orchestra — each musician (service) plays their part when told.

How It Works

  1. The Orchestrator sends a CreateOrder command to the Order Service.
  2. Order Service creates the order, replies with success.
  3. Orchestrator sends ChargePayment command to the Payment Service.
  4. Payment Service charges the card, replies with success.
  5. Orchestrator sends ReserveStock command to the Inventory Service.
  6. Inventory Service reserves stock, replies with success.
  7. Orchestrator sends ScheduleShipment command to the Shipping Service.
  8. Shipping Service schedules delivery, replies with success.
  9. Orchestrator marks the saga as COMPLETED.

Failure & Compensation

If the Shipping Service fails:

  1. Shipping Service replies with failure.
  2. Orchestrator sends ReleaseStock to Inventory Service.
  3. Orchestrator sends RefundPayment to Payment Service.
  4. Orchestrator sends CancelOrder to Order Service.
  5. Orchestrator marks the saga as COMPENSATED.

▶ Saga Orchestration

Central orchestrator directs each service step-by-step. Watch the command-response flow.

Saga Execution Coordinator (SEC)

The SEC is the heart of an orchestration-based saga. It is a stateful component that:

// Saga Orchestrator — state machine definition
const orderSagaDefinition = {
  name: 'OrderSaga',
  steps: [
    {
      name: 'createOrder',
      action:      { service: 'order',     command: 'CreateOrder' },
      compensation: { service: 'order',     command: 'CancelOrder' },
    },
    {
      name: 'chargePayment',
      action:      { service: 'payment',   command: 'ChargePayment' },
      compensation: { service: 'payment',   command: 'RefundPayment' },
    },
    {
      name: 'reserveStock',
      action:      { service: 'inventory', command: 'ReserveStock' },
      compensation: { service: 'inventory', command: 'ReleaseStock' },
    },
    {
      name: 'scheduleShipment',
      action:      { service: 'shipping',  command: 'ScheduleShipment' },
      // No compensation — last step; if it fails, we compensate prior steps
    },
  ],
};

class SagaOrchestrator {
  async execute(sagaDef, payload) {
    const sagaId = generateId();
    const saga = await this.store.create({
      id: sagaId, definition: sagaDef.name,
      currentStep: 0, status: 'RUNNING', payload,
      completedSteps: [],
    });

    for (let i = 0; i < sagaDef.steps.length; i++) {
      const step = sagaDef.steps[i];
      try {
        const result = await this.sendCommand(
          step.action.service, step.action.command, { sagaId, ...payload }
        );
        saga.completedSteps.push({ step: i, result });
        await this.store.update(sagaId, { currentStep: i + 1, completedSteps: saga.completedSteps });
      } catch (err) {
        // Step failed — begin compensation
        await this.compensate(sagaDef, saga, i - 1);
        return;
      }
    }
    await this.store.update(sagaId, { status: 'COMPLETED' });
  }

  async compensate(sagaDef, saga, fromStep) {
    for (let i = fromStep; i >= 0; i--) {
      const step = sagaDef.steps[i];
      if (step.compensation) {
        await this.sendCommandWithRetry(
          step.compensation.service, step.compensation.command,
          { sagaId: saga.id, ...saga.payload }
        );
      }
    }
    await this.store.update(saga.id, { status: 'COMPENSATED' });
  }
}

Orchestration: Pros & Cons

ProsCons
Easy to understand — saga flow is explicit in the orchestratorCentral coordinator is a potential single point of failure
No cyclic dependencies between servicesMore coupling — orchestrator must know about all participants
Easy to implement complex business rules and branchingRisk of centralizing too much logic in the orchestrator
Single place to observe saga status and debugOrchestrator code can become a "god class" if not carefully designed
Good for complex sagas with many steps and branchesAdditional infrastructure (saga store, message broker) required

Choreography vs Orchestration

AspectChoreographyOrchestration
CoordinationDecentralized — event-drivenCentralized — command-driven
CouplingLoose (services know events, not each other)Tighter (orchestrator knows all participants)
VisibilityHard to trace — saga spread across servicesEasy — orchestrator has full saga state
ComplexitySimple for linear sagas; messy for branchingHandles complex branching well
Failure pointNo SPOF (no coordinator)Orchestrator is a SPOF (mitigated by HA deployment)
TestingIntegration tests across servicesUnit test the orchestrator's state machine
Adding stepsAdd a new listener — minimal changesModify orchestrator definition
Best for2-4 step linear sagas5+ step sagas with branching/conditions
Interview tip: When asked "choreography or orchestration?", don't pick one blindly. Say: "For simple, linear workflows (e.g., 3 services), choreography keeps things decoupled. For complex workflows with branching, conditional steps, or many participants, orchestration provides clarity and easier debugging. In practice, many systems use both — orchestration within a bounded context, choreography across bounded contexts."

E-Commerce Saga: Complete Example

Let's walk through a complete e-commerce order saga with all the details — the happy path, the failure path, the state transitions, and the edge cases.

Services Involved

ServiceLocal Transaction (Ti)Compensating Transaction (Ci)Events Published
Order Create order (PENDING) Set order to CANCELLED OrderCreated / OrderCancelled
Payment Charge credit card Refund credit card PaymentCompleted / PaymentRefunded
Inventory Reserve stock (decrement available) Release stock (increment available) StockReserved / StockReleased
Shipping Schedule shipment Cancel shipment (if not yet dispatched) ShipmentScheduled / ShippingFailed

Happy Path: State Transitions

/* Saga State Machine — Happy Path */

Order:     PENDING ──────────────────────────────────────────── → CONFIRMED
Payment:            PENDING → CHARGED ───────────────────────────────────
Inventory:                              AVAILABLE → RESERVED ────────────
Shipping:                                                    → SCHEDULED

Timeline:  ─────T1──────────T2──────────T3──────────T4───────→ DONE
                 │           │           │           │
           OrderCreated  PaymentDone  StockReserved  ShipScheduled

Failure Path: Shipping Fails

/* Saga State Machine — Failure at T4 (Shipping) */

Order:     PENDING ──────────────────────────────────── → CANCELLED
Payment:            PENDING → CHARGED ──────── → REFUNDED
Inventory:                              RESERVED → RELEASED
Shipping:                                        ✗ FAILED

Timeline:  ─────T1──────T2──────T3──────T4 FAIL──C3──────C2──────C1───→ COMPENSATED
                 │       │       │       │        │       │       │
           Created   Charged  Reserved  Fail  Released  Refunded Cancelled

Edge Cases to Handle

Isolation Challenges

One of the biggest trade-offs of the Saga pattern is the lack of isolation. In a traditional ACID transaction, the "I" (Isolation) guarantees that concurrent transactions don't interfere with each other. A saga has no such guarantee because intermediate states are visible to other transactions.

Anomalies Without Isolation

AnomalyDescriptionE-Commerce Example
Lost updates One saga overwrites the update of another without seeing it. Two sagas both read inventory = 10, both reserve 8 items. Final inventory = 2 (should be −6, i.e., oversold).
Dirty reads A saga reads data that is later compensated (rolled back). Saga B reads order as CONFIRMED (during Saga A's execution). Saga A later fails and compensates to CANCELLED. Saga B acted on stale/wrong state.
Fuzzy / non-repeatable reads A saga reads the same data twice and gets different values because another saga modified it. Inventory Service reads stock = 10, then moments later reads stock = 3 because another saga reserved 7 units in between.

Countermeasures for Isolation

Since sagas cannot provide ACID isolation, you apply countermeasures — design techniques that mitigate the anomalies:

CountermeasureHow It WorksExample
Semantic locking Use application-level flags to indicate a resource is being processed by a saga. Other sagas see the flag and wait or skip. Order status = APPROVAL_PENDING instead of PENDING. Other sagas that need to modify this order see the lock flag and defer.
Commutative updates Design operations so the order of execution doesn't matter. Use increments/decrements instead of absolute values. Instead of SET stock = 8, use stock = stock - 2. Two concurrent reservations of 2 each produce the same result regardless of order.
Pessimistic view Reorder saga steps so that risky operations (those prone to failure) happen early, before committing external side effects. Validate payment before reserving inventory. If payment is likely to fail, you avoid the need to compensate inventory.
Reread value Before committing, reread the value to check if it has been modified by another saga since the original read. Before reserving stock, reread the current stock level. If it changed, re-evaluate whether the reservation is still possible.
Version file Record the operations on a record so that they can be reordered if they arrive out of order. Attach a version number to each inventory update. If an update with version 3 arrives before version 2, buffer it and apply in order.
By value (risk-based) Use saga for low-risk transactions; use 2PC or manual review for high-value ones. Orders under $100 use saga. Orders over $10,000 use 2PC or require manager approval.
Real-world wisdom: In practice, most e-commerce systems are fine with the reduced isolation of sagas. The window of vulnerability (between a local commit and the next step) is typically milliseconds. The probability of two sagas conflicting on the exact same resource at the exact same moment is low — and when it does happen, the compensating transaction handles it. The business impact of a rare double-charge followed by an automatic refund is much lower than the cost of a blocking 2PC protocol.

Implementation Patterns

Saga + Event Sourcing

Event sourcing pairs naturally with sagas. Each service stores its state as a sequence of events. The saga events become first-class citizens in the event store:

// Event store for Order Service
[
  { type: 'OrderCreated',   orderId: 'ORD-001', timestamp: '...', data: { items: [...], total: 99.99 } },
  { type: 'PaymentConfirmed', orderId: 'ORD-001', timestamp: '...', data: { chargeId: 'ch_xxx' } },
  { type: 'StockReserved',   orderId: 'ORD-001', timestamp: '...', data: { warehouse: 'WH-East' } },
  { type: 'ShippingFailed',  orderId: 'ORD-001', timestamp: '...', data: { reason: 'address unreachable' } },
  { type: 'StockReleased',   orderId: 'ORD-001', timestamp: '...', data: {} },
  { type: 'PaymentRefunded', orderId: 'ORD-001', timestamp: '...', data: { refundId: 're_xxx' } },
  { type: 'OrderCancelled',  orderId: 'ORD-001', timestamp: '...', data: { reason: 'shipping failed' } },
]

// Rebuild current state by replaying events
function rebuildOrderState(events) {
  let state = { status: 'UNKNOWN' };
  for (const event of events) {
    switch (event.type) {
      case 'OrderCreated':    state = { ...state, ...event.data, status: 'PENDING' }; break;
      case 'PaymentConfirmed': state.status = 'PAYMENT_DONE'; break;
      case 'StockReserved':    state.status = 'STOCK_RESERVED'; break;
      case 'OrderCancelled':  state.status = 'CANCELLED'; break;
    }
  }
  return state;
}

Saga + Transactional Outbox

A critical implementation detail: how do you atomically update the local database and publish an event? If the service crashes after the DB commit but before publishing the event, the saga stalls. The Transactional Outbox pattern solves this:

  1. Within the same local transaction, write the business data and an event record to an outbox table.
  2. A separate relay process (or CDC — Change Data Capture) reads unpublished events from the outbox table and publishes them to the message broker.
  3. Once published, the relay marks the outbox record as sent.
-- Single local transaction in the Payment Service
BEGIN;
  INSERT INTO payments (order_id, charge_id, amount, status)
    VALUES ('ORD-001', 'ch_xxx', 99.99, 'COMPLETED');

  INSERT INTO outbox (id, aggregate_type, aggregate_id, event_type, payload)
    VALUES (
      uuid(),
      'Payment',
      'ORD-001',
      'PaymentCompleted',
      '{"orderId":"ORD-001","chargeId":"ch_xxx","amount":99.99}'
    );
COMMIT;

-- Relay process (Debezium CDC or polling) picks up the outbox row
-- and publishes it to Kafka / RabbitMQ / SNS

Ensuring Idempotency

Every saga participant must handle duplicate messages gracefully. Common approaches:

// Pattern 1: Idempotency key in the database
async function handleReserveStock(command) {
  const existing = await db.reservations.findOne({
    sagaId: command.sagaId,
    step: 'reserve-stock',
  });
  if (existing) {
    // Already processed — return same result (idempotent)
    return existing.result;
  }

  const result = await doReserveStock(command.items);

  await db.reservations.insert({
    sagaId: command.sagaId,
    step: 'reserve-stock',
    result,
    processedAt: new Date(),
  });

  return result;
}

// Pattern 2: Database unique constraint
// CREATE UNIQUE INDEX idx_reservation_saga ON reservations(saga_id, step);
// INSERT will fail on duplicate — catch the error and return success

Saga vs 2PC vs Outbox Pattern

AspectSaga2PC (Two-Phase Commit)Transactional Outbox
Purpose Coordinate distributed transactions across services Ensure atomicity across distributed databases Reliably publish events after local DB commit
Scope Cross-service business workflows Cross-database atomic commits Single-service: DB write + event publish
Isolation None — intermediate states visible Full ACID — locked until commit Local only — single DB transaction
Consistency Eventual (via compensations) Strong (immediate) At-least-once delivery (idempotent consumers)
Blocking? No Yes — participants hold locks No
Scalability High — async, non-blocking Low — synchronous, lock-heavy High — local transaction only
Failure handling Compensating transactions Coordinator recovery log Retry relay until published
Complexity High (compensation logic, idempotency) Medium (protocol is well-defined) Low-Medium (outbox table + relay)
Use case Microservice workflows (e.g., order processing) Homogeneous databases, short transactions Reliable event publishing within a single service
Often combined? Yes — saga + outbox pattern is the standard approach Rarely combined with saga (different philosophies) Yes — outbox is a building block inside saga participants
Key relationship: The Outbox pattern is not an alternative to sagas — it is a building block used within each saga participant. Each service uses the outbox pattern to atomically update its local DB and publish events. The saga pattern then coordinates the sequence of these local transactions. Think of outbox as "how a single service reliably publishes events" and saga as "how multiple services coordinate a workflow."

Real-World Saga Implementations

Frameworks & Libraries

FrameworkLanguageTypeNotable Features
TemporalGo, Java, Python, TSOrchestrationDurable workflow engine; built-in retries and timeouts; saga is modeled as a workflow with compensations
Axon FrameworkJava/KotlinBothEvent sourcing + saga; SagaManager coordinates; integrates with Axon Server
MassTransitC# (.NET)BothState machine-based sagas; supports RabbitMQ, Azure Service Bus, Amazon SQS
Eventuate TramJavaBothBy Chris Richardson (author of microservices patterns); transactional outbox built-in
NServiceBusC# (.NET)OrchestrationEnterprise-grade; saga persistence; built-in message retry and dead-letter
AWS Step FunctionsAny (via Lambda)OrchestrationServerless saga orchestrator; visual workflow designer; built-in error handling

Example: Saga with Temporal

// Temporal workflow (Go) — saga with automatic compensation
func OrderSagaWorkflow(ctx workflow.Context, order Order) error {
    saga := NewSaga()

    // Step 1: Create Order
    err := workflow.ExecuteActivity(ctx, CreateOrder, order).Get(ctx, nil)
    if err != nil { return err }
    saga.AddCompensation(ctx, CancelOrder, order.ID)

    // Step 2: Charge Payment
    var chargeResult ChargeResult
    err = workflow.ExecuteActivity(ctx, ChargePayment, order).Get(ctx, &chargeResult)
    if err != nil {
        saga.Compensate(ctx)  // Runs CancelOrder
        return err
    }
    saga.AddCompensation(ctx, RefundPayment, chargeResult.ChargeID)

    // Step 3: Reserve Inventory
    err = workflow.ExecuteActivity(ctx, ReserveStock, order.Items).Get(ctx, nil)
    if err != nil {
        saga.Compensate(ctx)  // Runs RefundPayment, then CancelOrder
        return err
    }
    saga.AddCompensation(ctx, ReleaseStock, order.Items)

    // Step 4: Schedule Shipping
    err = workflow.ExecuteActivity(ctx, ScheduleShipping, order).Get(ctx, nil)
    if err != nil {
        saga.Compensate(ctx)  // Runs ReleaseStock, RefundPayment, CancelOrder
        return err
    }

    return nil  // Saga completed successfully
}

Best Practices

  1. Make all participants idempotent. With at-least-once message delivery, every handler will receive duplicates. Use idempotency keys, unique constraints, or state checks.
  2. Use the transactional outbox pattern. Never rely on "commit DB then publish event" — the process can crash between the two. Use an outbox table with CDC or polling relay.
  3. Design compensations carefully. A compensation is not always a simple "undo." Think about what side effects have occurred (emails sent, webhooks fired, external APIs called) and handle each one.
  4. Keep sagas short. The longer a saga runs, the larger the window for isolation anomalies and the more complex the compensation logic. Aim for 3–5 steps.
  5. Order steps by failure probability. Put the most likely-to-fail step early in the saga. If payment validation often fails, do it before inventory reservation — fewer compensations needed.
  6. Implement timeouts. If a participant doesn't respond within a deadline, treat it as a failure and begin compensation. Don't let sagas hang indefinitely.
  7. Use semantic locking. Flag resources being processed by a saga (e.g., order status = PROCESSING) so other sagas don't interfere.
  8. Monitor and alert. Track saga durations, failure rates, compensation rates, and stuck sagas. A saga that has been in COMPENSATING state for 10 minutes needs attention.
  9. Dead letter queue for failed compensations. If a compensating transaction fails after all retries, don't silently swallow the failure. Send it to a dead letter queue for manual intervention.
  10. Use saga IDs for correlation. Every message and log entry should include the saga ID so you can trace the entire flow across services.

Saga Pattern in System Design Interviews

When to Bring Up Sagas

Sagas are relevant whenever you're designing a microservices system with operations that span multiple services:

Interview tip: When the interviewer asks "how do you handle distributed transactions?", start with: "In microservices, we avoid 2PC because it's blocking and doesn't scale. Instead, we use the Saga pattern — a sequence of local transactions with compensating transactions for rollback. I'd choose between choreography and orchestration based on the complexity of the workflow."

Framework for Answering

  1. Identify the distributed transaction: "Placing an order involves Order, Payment, Inventory, and Shipping services."
  2. Explain why 2PC won't work: "2PC is blocking and doesn't scale for microservices with heterogeneous databases."
  3. Choose saga type: "I'd use orchestration because we have 4+ steps with potential branching."
  4. Define steps and compensations: List each Ti and Ci.
  5. Address isolation: "We use semantic locking and commutative updates to mitigate isolation anomalies."
  6. Address reliability: "Each service uses the transactional outbox pattern for atomic DB + event publishing. All handlers are idempotent."
  7. Address failure: "If a step fails, the orchestrator runs compensations in reverse. Failed compensations go to a dead letter queue."

Common Interview Questions

Summary

ConceptKey Takeaway
SagaSequence of local transactions + compensating transactions for distributed workflows
ChoreographyEvent-driven, decentralized; best for simple, linear sagas
OrchestrationCentral coordinator directs steps; best for complex, branching sagas
CompensationSemantic undo (not rollback); must be idempotent
IsolationNo ACID isolation; use semantic locking, commutative updates, pessimistic view
OutboxBuilding block for reliable event publishing within each saga participant
SECSaga Execution Coordinator — stateful state machine managing the saga lifecycle

The Saga pattern is the de facto standard for managing distributed transactions in microservices. It trades the strong guarantees of ACID for the availability, scalability, and loose coupling that microservices demand. The key to a successful saga implementation is careful compensation design, idempotent participants, and reliable event delivery via the transactional outbox pattern. In the next post, we will explore the Circuit Breaker pattern — another essential tool for building resilient distributed systems.