← All Posts
High Level Design Series · Real-World Designs · Post 59

Design: Ticketmaster (Booking System)

🎟️ The Core Challenge: When 100,000 fans rush to buy tickets for a Taylor Swift concert the moment they go on sale, how do you ensure every seat is sold exactly once, nobody gets a double-booked ticket, and the system doesn't collapse under load? This post designs a Ticketmaster-scale booking platform from scratch.

Requirements & Scale

Functional Requirements

FeatureDescription
Browse EventsUsers browse concerts, sports, theater by city, date, genre
SearchFull-text search by artist, venue, event name with filters
View Seat MapInteractive venue map showing sections, rows, individual seat availability in real time
Reserve SeatTemporarily hold a seat while user completes checkout (10-minute TTL)
Pay & ConfirmSecure payment processing, confirmation email, digital ticket generation
CancellationCancel booking, trigger refund, release seat back to inventory

Non-Functional Requirements

RequirementTarget
Concurrent users per popular event100,000+ at on-sale moment
Seat reservation latency< 200ms p99
Double-booking rate0% (absolute correctness)
System availability99.99% during on-sale windows
Throughput10,000+ seat reservations/sec peak
Payment timeout10 minutes TTL, auto-release on expiry

Back-of-Envelope Estimation

Consider a stadium with 80,000 seats going on sale simultaneously:

⚠️ The Thundering Herd: Unlike most e-commerce sites where load is distributed over time, ticket sales create an extreme flash crowd — nearly all demand arrives in a 30-second window. This fundamentally shapes every design decision.

Seat Inventory Model

Seat State Machine

Every seat in the system follows a strict state machine with four states:

AVAILABLE — Open for selection
↓ User selects seat
RESERVED — Held with 10-min TTL
↓ Payment succeeds     ↓ TTL expires
BOOKED — Permanently sold
RELEASED — Back to available

Database Schema

-- Venue & event metadata (PostgreSQL)
CREATE TABLE venues (
    id          UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    name        VARCHAR(255) NOT NULL,
    city        VARCHAR(100),
    capacity    INT NOT NULL,
    seat_map    JSONB NOT NULL  -- section/row/seat layout
);

CREATE TABLE events (
    id              UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    venue_id        UUID REFERENCES venues(id),
    name            VARCHAR(500) NOT NULL,
    artist          VARCHAR(255),
    category        VARCHAR(50),  -- concert, sports, theater
    event_date      TIMESTAMPTZ NOT NULL,
    on_sale_date    TIMESTAMPTZ NOT NULL,
    status          VARCHAR(20) DEFAULT 'upcoming',
    created_at      TIMESTAMPTZ DEFAULT now()
);

CREATE INDEX idx_events_date ON events(event_date);
CREATE INDEX idx_events_category ON events(category);
CREATE INDEX idx_events_onsale ON events(on_sale_date);

-- Seat inventory (PostgreSQL - source of truth)
CREATE TABLE seats (
    id          UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    event_id    UUID REFERENCES events(id),
    section     VARCHAR(20) NOT NULL,
    row         VARCHAR(10) NOT NULL,
    seat_number INT NOT NULL,
    price_tier  VARCHAR(20),
    price_cents INT NOT NULL,
    status      VARCHAR(20) DEFAULT 'available',
    version     INT DEFAULT 0,  -- optimistic lock version
    reserved_by UUID,           -- user_id holding reservation
    reserved_at TIMESTAMPTZ,    -- reservation timestamp
    booked_by   UUID,
    booked_at   TIMESTAMPTZ,
    UNIQUE(event_id, section, row, seat_number)
);

CREATE INDEX idx_seats_event_status ON seats(event_id, status);
CREATE INDEX idx_seats_reserved_at ON seats(reserved_at)
    WHERE status = 'reserved';  -- partial index for TTL cleanup

-- Booking records (PostgreSQL)
CREATE TABLE bookings (
    id              UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    user_id         UUID NOT NULL,
    event_id        UUID REFERENCES events(id),
    seat_ids        UUID[] NOT NULL,
    total_cents     INT NOT NULL,
    payment_intent  VARCHAR(255),   -- Stripe/payment provider ID
    status          VARCHAR(20) DEFAULT 'pending',
    created_at      TIMESTAMPTZ DEFAULT now(),
    confirmed_at    TIMESTAMPTZ,
    cancelled_at    TIMESTAMPTZ
);

CREATE INDEX idx_bookings_user ON bookings(user_id);
CREATE INDEX idx_bookings_event ON bookings(event_id);

Why a version Column?

The version column is the heart of our concurrency control. Every time a seat's state changes, the version increments. When two users try to reserve the same seat simultaneously, only one can match the current version — the other's UPDATE affects zero rows and must retry.

💡 Design Choice — PostgreSQL vs Redis for Seats: We use PostgreSQL as the source of truth for seat ownership (it provides ACID guarantees), but front it with Redis for fast availability checks and temporary reservation locks. The dual-layer approach gives us both speed and correctness.

Distributed Locking & Double-Booking Prevention

The absolute worst outcome for a ticketing system is a double booking: two people both believe they own the same seat. We employ multiple layers of defense:

Layer 1: Optimistic Locking (Primary Defense)

Optimistic concurrency control lets concurrent requests proceed without blocking, but only one can commit successfully:

-- Reserve a seat (atomic operation)
-- Returns 1 row affected on success, 0 on conflict
UPDATE seats
SET    status      = 'reserved',
       version     = version + 1,
       reserved_by = :user_id,
       reserved_at = now()
WHERE  id      = :seat_id
  AND  status  = 'available'
  AND  version = :expected_version;

-- Application checks: if rows_affected == 0 → conflict!
-- User gets: "Sorry, this seat was just taken. Please select another."

This is a compare-and-swap (CAS) operation. The three conditions in the WHERE clause ensure:

  1. id = :seat_id — correct seat
  2. status = 'available' — seat hasn't been taken
  3. version = :expected_version — no concurrent modification since we last read it
⚠️ Why not just check status? Checking status = 'available' alone isn't sufficient. Consider: User A reserves seat → payment fails → seat released back to "available" → User B reads seat (version=2) → User A retries with stale version=1. Without version check, A's stale write could overwrite B's valid state.

Layer 2: Redis Distributed Lock (Fast Path)

Before hitting PostgreSQL, we acquire a short-lived Redis lock per seat to serialize reservation attempts:

-- Redis lock acquisition (using SET NX EX)
SET seat_lock:{event_id}:{seat_id} {user_id} NX EX 5

-- NX = only set if key doesn't exist (atomic)
-- EX 5 = auto-expire in 5 seconds
-- Returns OK if lock acquired, nil if seat already locked

-- After PostgreSQL update succeeds or fails:
DEL seat_lock:{event_id}:{seat_id}  -- release lock

This Redis lock serves as a fast-rejection filter. If 500 users click the same seat simultaneously, only 1 acquires the Redis lock; the other 499 get instant rejection without hitting the database at all.

async function reserveSeat(userId, seatId, eventId, expectedVersion) {
    const lockKey = `seat_lock:${eventId}:${seatId}`;

    // Layer 2: Redis fast lock
    const locked = await redis.set(lockKey, userId, 'NX', 'EX', 5);
    if (!locked) {
        return { success: false, error: 'SEAT_LOCKED_BY_ANOTHER_USER' };
    }

    try {
        // Layer 1: Optimistic lock in PostgreSQL
        const result = await db.query(`
            UPDATE seats
            SET status = 'reserved', version = version + 1,
                reserved_by = $1, reserved_at = now()
            WHERE id = $2 AND status = 'available' AND version = $3
        `, [userId, seatId, expectedVersion]);

        if (result.rowCount === 0) {
            return { success: false, error: 'SEAT_ALREADY_TAKEN' };
        }

        // Set reservation TTL in Redis
        await redis.set(
            `reservation:${eventId}:${seatId}`,
            JSON.stringify({ userId, bookingDeadline: Date.now() + 600000 }),
            'EX', 600  // 10 minutes
        );

        return { success: true, expiresIn: 600 };
    } finally {
        await redis.del(lockKey);  // Always release lock
    }
}

Layer 3: Database Constraints (Safety Net)

Even if application logic has bugs, the database schema itself prevents double booking:

-- UNIQUE constraint prevents two bookings for same seat
ALTER TABLE seats ADD CONSTRAINT unique_booking
    EXCLUDE USING gist (
        id WITH =,
        tstzrange(reserved_at, reserved_at + interval '10 minutes') WITH &&
    ) WHERE (status = 'reserved');

-- Or simpler: application-level check within a transaction
BEGIN;
    SELECT status, version FROM seats WHERE id = :seat_id FOR UPDATE;
    -- Row-level lock acquired, no other transaction can modify this seat
    -- Verify status is still 'available', then update
    UPDATE seats SET status = 'reserved', ...;
COMMIT;

Comparison of Locking Strategies

StrategyThroughputLatencyCorrectnessComplexity
Pessimistic (SELECT FOR UPDATE)Low — serializedHigh — lock waits✅ PerfectLow
Optimistic (version CAS)High — no blockingLow — fail-fast✅ PerfectMedium
Redis + OptimisticVery High — DB shieldedVery Low✅ PerfectHigh
Queue-based serialPredictableMedium — queued✅ PerfectHigh
💡 Our Approach: Redis distributed lock + PostgreSQL optimistic locking. Redis handles the stampede (fast rejection), PostgreSQL ensures absolute correctness (ACID guarantees). This combo handles 10K+ reservations/sec with zero double bookings.

Animation: Seat Reservation Race Condition

🏎️ Seat Reservation Race Condition

Two users try to book Seat A-12 simultaneously. See what happens without locking (double booking!) vs. with optimistic locking (safe).

Reservation with TTL

The 10-Minute Timer

When a user reserves a seat, they get a 10-minute window to complete payment. This prevents seat hoarding while giving legitimate buyers enough time to check out.

1
User selects Seat A-12 → POST /api/reserve
2
Seat status → RESERVED with reserved_at = now()
3
Redis TTL key set: reservation:{event}:{seat} EX 600
4
Client shows countdown: 9:59... 9:58... 9:57...
5a
Payment succeeds → Seat status → BOOKED
5b
TTL expires / user abandons → Seat status → AVAILABLE ♻️

TTL Enforcement: Three Mechanisms

We don't rely on a single mechanism for releasing expired reservations — we use three independent approaches for reliability:

Mechanism 1: Redis Keyspace Notifications

// Subscribe to Redis key expiration events
redis.config('SET', 'notify-keyspace-events', 'Ex');

redis.subscribe('__keyevent@0__:expired', (channel, expiredKey) => {
    // Key format: reservation:{eventId}:{seatId}
    const [_, eventId, seatId] = expiredKey.split(':');

    // Release seat in PostgreSQL
    await db.query(`
        UPDATE seats
        SET status = 'available', version = version + 1,
            reserved_by = NULL, reserved_at = NULL
        WHERE id = $1 AND status = 'reserved'
    `, [seatId]);

    logger.info(`Seat ${seatId} released after TTL expiry`);
});

Mechanism 2: Scheduled Cleanup Job (Cron)

-- Run every 60 seconds: release seats reserved > 10 minutes ago
UPDATE seats
SET    status = 'available',
       version = version + 1,
       reserved_by = NULL,
       reserved_at = NULL
WHERE  status = 'reserved'
  AND  reserved_at < now() - interval '10 minutes'
RETURNING id, event_id;

-- Log released seats for monitoring
-- This catches any seats that Redis notification missed

Mechanism 3: Lazy Check on Read

async function getSeatAvailability(eventId, seatId) {
    const seat = await db.query(
        'SELECT * FROM seats WHERE id = $1', [seatId]
    );

    // Lazy TTL check: if reserved but past deadline, treat as available
    if (seat.status === 'reserved' &&
        Date.now() - seat.reserved_at > 600000) {
        await releaseSeat(seatId);  // async cleanup
        seat.status = 'available';
    }

    return seat;
}
💡 Belt, Suspenders, and Duct Tape: Redis notifications provide near-instant release (~1s). The cron job catches anything Redis missed (network partitions, crashes). Lazy checks on read ensure the user always sees correct state even if both background mechanisms failed. Triple redundancy for a critical business requirement.

Payment Timeout Handling — Detailed Flow

async function processPayment(bookingId, paymentDetails) {
    const booking = await getBooking(bookingId);
    const seats = await getSeats(booking.seatIds);

    // Verify all seats still reserved by this user
    for (const seat of seats) {
        if (seat.status !== 'reserved' || seat.reserved_by !== booking.userId) {
            throw new Error('RESERVATION_EXPIRED');
        }
        // Double-check TTL
        const elapsed = Date.now() - new Date(seat.reserved_at).getTime();
        if (elapsed > 600000) {
            throw new Error('RESERVATION_EXPIRED');
        }
    }

    // Begin atomic payment + booking confirmation
    const tx = await db.beginTransaction();
    try {
        // Charge payment (idempotency key = bookingId)
        const charge = await stripe.paymentIntents.create({
            amount: booking.totalCents,
            currency: 'usd',
            idempotency_key: bookingId,
            metadata: { bookingId, eventId: booking.eventId }
        });

        if (charge.status !== 'succeeded') {
            throw new Error('PAYMENT_FAILED');
        }

        // Confirm all seats atomically
        await tx.query(`
            UPDATE seats
            SET status = 'booked', version = version + 1,
                booked_by = $1, booked_at = now()
            WHERE id = ANY($2) AND status = 'reserved'
                AND reserved_by = $1
        `, [booking.userId, booking.seatIds]);

        // Confirm booking
        await tx.query(`
            UPDATE bookings
            SET status = 'confirmed', payment_intent = $1,
                confirmed_at = now()
            WHERE id = $2
        `, [charge.id, bookingId]);

        await tx.commit();

        // Clean up Redis reservation keys
        for (const seatId of booking.seatIds) {
            await redis.del(`reservation:${booking.eventId}:${seatId}`);
        }

        // Send confirmation email & generate ticket
        await notificationService.sendConfirmation(booking);
        await ticketService.generateDigitalTicket(booking);

        return { success: true, charge };
    } catch (err) {
        await tx.rollback();

        // If payment succeeded but DB failed, refund immediately
        if (err.message !== 'PAYMENT_FAILED' && charge?.id) {
            await stripe.refunds.create({ payment_intent: charge.id });
        }

        // Release seats
        await releaseSeats(booking.seatIds);
        throw err;
    }
}
⚠️ The Payment-DB Consistency Problem: What if payment succeeds but the database update fails? We handle this with: (1) idempotency keys on payments so retries are safe, (2) immediate refund if DB commit fails, (3) a reconciliation job that compares Stripe charges against booking records every hour.

Virtual Waiting Room

When 100K users arrive simultaneously for a popular event, even our optimized booking system would buckle. The solution: a virtual waiting room that queues users before they can access the booking page.

How It Works

1
Arrival: User hits event page → redirected to waiting room with a unique queue token
2
Random Fair Queue: Position assigned randomly (not FIFO) to prevent bot advantage from speed
3
Batch Admission: System admits batches of ~1,000 users at configurable intervals
4
Booking Token: Admitted users receive a signed JWT valid for 15 minutes
5
Active Session Cap: Max 5,000 concurrent users in the booking flow at any time

Queue Implementation

// Queue Service (Redis Sorted Set)
class VirtualQueueService {
    constructor(redis) { this.redis = redis; }

    // Enqueue user with random score (fair lottery)
    async enqueue(eventId, userId) {
        const queueKey = `queue:${eventId}`;
        const score = Math.random(); // random position, not timestamp

        await this.redis.zadd(queueKey, score, userId);
        const position = await this.redis.zrank(queueKey, userId);
        const total = await this.redis.zcard(queueKey);

        return {
            token: generateQueueToken(eventId, userId),
            position: position + 1,
            totalInQueue: total,
            estimatedWait: Math.ceil((position + 1) / 1000) * 30 // ~30s per batch
        };
    }

    // Admit next batch of users
    async admitBatch(eventId, batchSize = 1000) {
        const queueKey = `queue:${eventId}`;
        const activeKey = `active:${eventId}`;

        // Check current active session count
        const activeCount = await this.redis.scard(activeKey);
        const maxActive = 5000;

        if (activeCount >= maxActive) {
            return { admitted: 0, reason: 'MAX_ACTIVE_REACHED' };
        }

        const slotsAvailable = Math.min(batchSize, maxActive - activeCount);

        // Pop lowest-scored users (random order = fair)
        const users = await this.redis.zpopmin(queueKey, slotsAvailable);

        const admittedUsers = [];
        for (let i = 0; i < users.length; i += 2) {
            const userId = users[i];
            // Add to active set with 15-min expiry
            await this.redis.sadd(activeKey, userId);
            // Generate booking JWT
            const bookingToken = jwt.sign(
                { userId, eventId, type: 'booking_access' },
                SECRET, { expiresIn: '15m' }
            );
            admittedUsers.push({ userId, bookingToken });
        }

        return { admitted: admittedUsers.length, users: admittedUsers };
    }

    // Check user's position in queue
    async getPosition(eventId, userId) {
        const queueKey = `queue:${eventId}`;
        const rank = await this.redis.zrank(queueKey, userId);

        if (rank === null) return { inQueue: false };

        return {
            inQueue: true,
            position: rank + 1,
            total: await this.redis.zcard(queueKey),
            estimatedWait: Math.ceil((rank + 1) / 1000) * 30
        };
    }
}

// Batch admission scheduler (runs every 30 seconds during on-sale)
setInterval(async () => {
    const result = await queueService.admitBatch(eventId, 1000);
    metrics.gauge('queue.admitted_batch', result.admitted);
    metrics.gauge('queue.remaining', await redis.zcard(`queue:${eventId}`));
}, 30000);

Why Random Order, Not FIFO?

💡 Anti-Bot Fairness: If the queue were FIFO (first-come-first-served), bots with lower network latency would always get front positions. By assigning random positions, a human on a phone in rural Kansas has the same probability of getting in as a bot farm on AWS. Ticketmaster calls this "Verified Fan" — we enforce it architecturally.

Waiting Room UI (Client-Side Polling)

// Client polls every 5 seconds for queue status
async function pollQueueStatus() {
    const response = await fetch(`/api/queue/status?token=${queueToken}`);
    const data = await response.json();

    if (data.admitted) {
        // User admitted! Redirect to booking page with JWT
        window.location.href =
            `/event/${eventId}/book?token=${data.bookingToken}`;
        return;
    }

    // Update waiting room UI
    document.getElementById('position').textContent =
        `Position: ${data.position} of ${data.total}`;
    document.getElementById('eta').textContent =
        `Estimated wait: ${data.estimatedWait}s`;

    // Show fun animation to keep users engaged
    updateProgressAnimation(data.position, data.total);

    // Poll again
    setTimeout(pollQueueStatus, 5000);
}

Animation: Virtual Waiting Room

🚪 Virtual Waiting Room — Batch Admission

100K users arrive at once → placed in virtual queue → admitted in batches of 1,000 → booking system handles manageable load.

System Architecture

Service Overview

🎭 Event Service

CRUD for events and venues. Serves event catalog, schedules, and metadata. Read-heavy, cached aggressively with CDN + Redis.

🔍 Search Service

Elasticsearch-backed full-text search. Filters by date, city, genre, artist. Near-real-time index updates via CDC from PostgreSQL.

💺 Inventory Service

Core seat management. Owns the seat state machine. Handles reservation, release, and availability queries. Redis + PostgreSQL dual layer.

📋 Booking Service

Orchestrates the booking flow: validate → reserve → pay → confirm. Manages booking records and coordinates between Inventory and Payment.

💳 Payment Service

Integrates with Stripe/PayPal. Handles charges, refunds, webhooks. Idempotent operations with deduplication keys.

🚪 Queue Service

Virtual waiting room for high-demand events. Redis sorted sets for O(log N) operations. JWT token generation for admitted users.

📧 Notification Service

Sends confirmation emails, digital tickets, and queue status updates. Async via message queue (Kafka/SQS).

📊 Analytics Service

Tracks conversion funnels, popular events, seat demand heatmaps. Feeds into pricing and capacity planning.

Architecture Diagram

                        ┌──────────────────┐
                        │   CDN / CloudFront│
                        └────────┬─────────┘
                                 │
                        ┌────────▼─────────┐
                        │   Load Balancer   │
                        │   (ALB / Nginx)   │
                        └────────┬─────────┘
                                 │
              ┌──────────────────┼──────────────────┐
              │                  │                   │
     ┌────────▼──────┐  ┌───────▼────────┐  ┌──────▼───────┐
     │ API Gateway   │  │ Queue Service  │  │ WebSocket    │
     │ (Rate Limit)  │  │ (Waiting Room) │  │ Server       │
     └───────┬───────┘  └───────┬────────┘  │ (seat updates)│
             │                  │            └──────┬───────┘
    ┌────────┼────────┐         │                   │
    │        │        │         │                   │
┌───▼──┐ ┌──▼───┐ ┌──▼────┐   │            ┌──────▼───────┐
│Event │ │Search│ │Booking│◄──┘            │  Redis Pub/Sub│
│Svc   │ │Svc   │ │Svc    │               └──────┬───────┘
└──┬───┘ └──┬───┘ └──┬────┘                      │
   │        │        │                            │
   │   ┌────▼────┐   │    ┌──────────┐    ┌──────▼───────┐
   │   │Elastic- │   ├───►│Payment   │    │  Redis        │
   │   │search   │   │    │Service   │    │  (Locks,TTL,  │
   │   └─────────┘   │    └────┬─────┘    │   Inventory)  │
   │                  │         │          └──────┬───────┘
   │           ┌──────▼─────┐   │                 │
   │           │ Inventory  │◄──┼─────────────────┘
   │           │ Service    │   │
   │           └──────┬─────┘   │
   │                  │         │
   └──────────┬───────┴─────────┘
              │
     ┌────────▼─────────┐         ┌─────────────┐
     │   PostgreSQL      │         │   Kafka      │
     │   (Source of Truth)│────────►│   (Events)   │
     └──────────────────┘         └──────┬──────┘
                                         │
                                  ┌──────▼──────┐
                                  │ Notification │
                                  │ Service      │
                                  └─────────────┘

Data Flow for Seat Reservation

sequenceDiagram:
  User → API Gateway     : POST /api/reserve {seatId, eventId}
  API Gateway → Queue Svc: Check booking token validity
  Queue Svc → API Gateway: ✅ Token valid, user admitted
  API Gateway → Booking   : Forward reservation request
  Booking → Redis         : SET seat_lock:{id} NX EX 5
  Redis → Booking         : OK (lock acquired)
  Booking → Redis         : GET seat:{eventId}:{seatId}
  Redis → Booking         : {status: "available", version: 7}
  Booking → Inventory     : Reserve seat (version=7)
  Inventory → PostgreSQL  : UPDATE seats SET status='reserved'
                            WHERE version=7 AND status='available'
  PostgreSQL → Inventory  : 1 row affected ✅
  Inventory → Redis       : SET seat status=reserved, SET TTL 600s
  Inventory → Booking     : Reserved, expires in 600s
  Booking → Redis         : DEL seat_lock:{id}
  Booking → User          : {reserved: true, expiresIn: 600}
  Booking → WebSocket     : Broadcast: seat A-12 now reserved

Database Design Deep Dive

PostgreSQL — Source of Truth

TablePurposeReads/secWrites/sec
eventsEvent catalogHigh (cached)Low (admin only)
venuesVenue & seat map layoutMediumRare
seatsPer-seat inventory & statusVery HighVery High (during sale)
bookingsConfirmed booking recordsLowHigh (during sale)

Partitioning Strategy — The seats table is partitioned by event_id:

-- Partition seats by event for isolation and performance
CREATE TABLE seats (
    id UUID, event_id UUID, section VARCHAR(20),
    row VARCHAR(10), seat_number INT,
    status VARCHAR(20), version INT, ...
) PARTITION BY HASH (event_id);

-- Create 16 partitions (adjust based on concurrent events)
CREATE TABLE seats_p0 PARTITION OF seats FOR VALUES WITH (MODULUS 16, REMAINDER 0);
CREATE TABLE seats_p1 PARTITION OF seats FOR VALUES WITH (MODULUS 16, REMAINDER 1);
-- ... seats_p2 through seats_p15

-- Benefits:
-- 1. Hot event queries only scan 1 partition (1/16th of data)
-- 2. Vacuum runs independently per partition
-- 3. Can move cold event data to cheaper storage

Redis — Fast Cache & Lock Layer

// Redis data structures per event

// 1. Seat availability bitmap (O(1) per seat)
SETBIT seat_avail:{eventId} {seatIndex} 1    // available
SETBIT seat_avail:{eventId} {seatIndex} 0    // taken
BITCOUNT seat_avail:{eventId}                 // count available

// 2. Seat detail hash
HSET seat:{eventId}:{seatId} status "available" version 7 price 15000

// 3. Reservation TTL keys (auto-expire)
SET reservation:{eventId}:{seatId} {userId} EX 600

// 4. Distributed locks (short-lived)
SET seat_lock:{eventId}:{seatId} {userId} NX EX 5

// 5. Event-level counters
INCR event_views:{eventId}
INCR event_reservations:{eventId}
DECR event_available:{eventId}

// 6. Queue sorted set
ZADD queue:{eventId} {randomScore} {userId}
ZPOPMIN queue:{eventId} 1000

// 7. Active session set
SADD active:{eventId} {userId}
SCARD active:{eventId}

Elasticsearch — Search Index

// Event search index mapping
PUT /events
{
    "mappings": {
        "properties": {
            "name":     { "type": "text", "analyzer": "standard" },
            "artist":   { "type": "text", "fields": { "keyword": { "type": "keyword" }}},
            "category": { "type": "keyword" },
            "city":     { "type": "keyword" },
            "venue":    { "type": "text" },
            "date":     { "type": "date" },
            "on_sale":  { "type": "date" },
            "price_range": { "type": "integer_range" },
            "available_seats": { "type": "integer" },
            "location": { "type": "geo_point" },
            "tags":     { "type": "keyword" }
        }
    }
}

// Example search: "Taylor Swift concerts in NYC this summer"
GET /events/_search
{
    "query": {
        "bool": {
            "must": [
                { "match": { "artist": "Taylor Swift" }},
                { "term":  { "city": "New York" }},
                { "range": { "date": { "gte": "2026-06-01", "lte": "2026-08-31" }}}
            ],
            "filter": [
                { "range": { "available_seats": { "gt": 0 }}}
            ]
        }
    },
    "sort": [{ "date": "asc" }]
}

Handling Flash Sales

A "flash sale" is the most extreme scenario: an event announced to go on sale at a specific time, with demand massively exceeding supply. Here's our multi-layered defense:

1. Pre-Warming

// 5 minutes before on-sale time:
async function prewarmEvent(eventId) {
    // Load all seat data into Redis
    const seats = await db.query(
        'SELECT * FROM seats WHERE event_id = $1', [eventId]
    );

    const pipeline = redis.pipeline();
    for (const seat of seats) {
        pipeline.hset(
            `seat:${eventId}:${seat.id}`,
            'status', seat.status,
            'version', seat.version,
            'price', seat.price_cents,
            'section', seat.section,
            'row', seat.row,
            'number', seat.seat_number
        );
        // Set availability bitmap
        const seatIndex = computeSeatIndex(seat);
        pipeline.setbit(`seat_avail:${eventId}`, seatIndex, 1);
    }
    await pipeline.exec();

    // Pre-scale infrastructure
    await k8s.scaleDeployment('booking-service', { replicas: 20 });
    await k8s.scaleDeployment('inventory-service', { replicas: 15 });

    // Warm database connection pools
    await db.warmPool(50);

    logger.info(`Pre-warmed event ${eventId}: ${seats.length} seats cached`);
}

2. Rate Limiting (Per-User and Global)

// API Gateway rate limiting configuration
const rateLimits = {
    // Per-user limits (by JWT / IP)
    'POST /api/reserve': {
        window: '10s',
        max: 3,           // max 3 reservation attempts per 10 seconds
        strategy: 'sliding_window'
    },
    'GET /api/seats': {
        window: '1s',
        max: 10,          // max 10 seat map refreshes per second
        strategy: 'token_bucket'
    },

    // Global limits (protect backends)
    'global:reservation': {
        window: '1s',
        max: 5000,        // max 5000 reservations/sec system-wide
        strategy: 'fixed_window',
        overflow: 'queue'  // excess goes to waiting room
    }
};

// Redis-based sliding window rate limiter
async function checkRateLimit(key, window, max) {
    const now = Date.now();
    const windowStart = now - (window * 1000);

    const pipeline = redis.pipeline();
    pipeline.zremrangebyscore(key, 0, windowStart);  // cleanup old
    pipeline.zadd(key, now, `${now}:${Math.random()}`);
    pipeline.zcard(key);
    pipeline.expire(key, window);

    const results = await pipeline.exec();
    const count = results[2][1];

    return count <= max;
}

3. Horizontal Scaling Strategy

ComponentNormalFlash SaleScale Factor
API Gateway4 pods20 pods5x
Booking Service4 pods20 pods5x
Inventory Service3 pods15 pods5x
Queue Service2 pods8 pods4x
Redis Cluster3 shards6 shards2x
PostgreSQL read replicas252.5x

4. Graceful Degradation

// Circuit breaker configuration per service
const circuitBreakers = {
    'payment-service': {
        failureThreshold: 5,     // 5 failures in window
        resetTimeout: 30000,     // try again after 30s
        fallback: async (req) => {
            // Queue payment for retry, keep reservation alive
            await paymentQueue.push(req);
            return { status: 'PAYMENT_QUEUED', retryIn: 30 };
        }
    },
    'notification-service': {
        failureThreshold: 10,
        resetTimeout: 60000,
        fallback: async (req) => {
            // Booking still valid, notification sent later
            await notificationQueue.push(req);
            return { status: 'NOTIFICATION_DELAYED' };
        }
    }
};

// Seat map degradation: if real-time WebSocket overloaded,
// fall back to polling every 5 seconds
if (wsConnectionCount > 50000) {
    return { mode: 'polling', interval: 5000 };
}

Real-Time Seat Map Updates

WebSocket Architecture

Users viewing the seat map need to see seats disappear in real time as others book them. We use WebSockets backed by Redis Pub/Sub:

// Server: Publish seat status changes
async function onSeatStatusChange(eventId, seatId, newStatus) {
    await redis.publish(`seat_updates:${eventId}`, JSON.stringify({
        seatId,
        status: newStatus,
        timestamp: Date.now()
    }));
}

// WebSocket server: Subscribe and broadcast
const wss = new WebSocket.Server({ port: 8080 });

wss.on('connection', (ws, req) => {
    const eventId = parseEventId(req.url);
    const subscriber = redis.duplicate();

    subscriber.subscribe(`seat_updates:${eventId}`);
    subscriber.on('message', (channel, message) => {
        if (ws.readyState === WebSocket.OPEN) {
            ws.send(message);
        }
    });

    ws.on('close', () => {
        subscriber.unsubscribe();
        subscriber.quit();
    });
});

// Client: Update seat map in real time
const ws = new WebSocket(`wss://api.example.com/seats/${eventId}`);
ws.onmessage = (event) => {
    const update = JSON.parse(event.data);
    const seatEl = document.querySelector(`[data-seat="${update.seatId}"]`);

    if (update.status === 'reserved' || update.status === 'booked') {
        seatEl.classList.remove('available');
        seatEl.classList.add('taken');
        seatEl.setAttribute('disabled', 'true');
    } else if (update.status === 'available') {
        seatEl.classList.remove('taken');
        seatEl.classList.add('available');
        seatEl.removeAttribute('disabled');
    }
};
💡 Scalability Note: Each WebSocket server can handle ~50K concurrent connections. With Redis Pub/Sub, we can horizontally scale WebSocket servers — they all subscribe to the same Redis channel and broadcast to their connected clients. For a 100K-user event, 2–3 WebSocket servers suffice.

Anti-Fraud & Bot Prevention

Defense LayerTechniqueBlocks
NetworkIP rate limiting, geo-fencing, ASN reputationBasic bots, data-center traffic
ApplicationCAPTCHA at queue entry, browser fingerprintingAutomated scripts
QueueRandom ordering (not FIFO), Verified Fan registrationSpeed advantage bots
Booking1 booking per user per event, phone verificationScalper bulk buying
PaymentCard velocity checks, address verificationStolen cards, fraud rings
Post-SaleTransfer restrictions, identity-linked ticketsSecondary market scalping
// Bot detection middleware
async function botDetection(req, res, next) {
    const signals = {
        // Timing analysis
        timeToInteract: req.headers['x-time-to-interact'],
        // Browser fingerprint
        fingerprint: req.headers['x-fp'],
        // Mouse movement entropy (sent by client JS)
        mouseEntropy: req.body?.mouseEntropy,
        // Request patterns
        requestInterval: getRequestInterval(req.ip),
    };

    const score = await mlModel.predict(signals);

    if (score > 0.9) {
        // Definitely a bot → block
        return res.status(429).json({ error: 'Suspicious activity detected' });
    } else if (score > 0.7) {
        // Possibly a bot → challenge with CAPTCHA
        return res.status(403).json({ challenge: 'captcha', token: generateCaptchaToken() });
    }

    // Likely human → allow
    next();
}

Monitoring & Observability

Key Metrics Dashboard

MetricAlert ThresholdWhy It Matters
Reservation p99 latency> 500msUsers abandon if seat selection feels slow
Double-booking count> 0Absolute correctness requirement — page the team
Reservation timeout rate> 30%TTL too short or payment flow too complex
Queue wait time p95> 10 minUsers leave if wait feels hopeless
Payment failure rate> 5%Provider issue or card decline spike
Redis lock contention> 80% rejectionNeed more seat inventory or better distribution
Active WebSocket connections> 90% capacityScale up WebSocket servers
// Distributed tracing for the booking flow
// Every request gets a trace ID that follows it through all services
const trace = {
    traceId: 'abc-123-def-456',
    spans: [
        { service: 'api-gateway',      duration: '5ms',  status: 'ok' },
        { service: 'queue-service',     duration: '2ms',  status: 'ok' },
        { service: 'booking-service',   duration: '12ms', status: 'ok' },
        { service: 'redis-lock',        duration: '1ms',  status: 'ok' },
        { service: 'inventory-service', duration: '8ms',  status: 'ok' },
        { service: 'postgresql',        duration: '15ms', status: 'ok' },
        { service: 'redis-cache',       duration: '1ms',  status: 'ok' },
        // Total: ~44ms end-to-end for reservation
    ]
};

Failure Scenarios & Recovery

Scenario 1: Redis Goes Down

// Fallback: bypass Redis lock, rely on PostgreSQL optimistic lock only
async function reserveSeatWithFallback(userId, seatId, version) {
    try {
        // Try Redis lock first
        const locked = await redis.set(lockKey, userId, 'NX', 'EX', 5);
        if (!locked) return { error: 'SEAT_LOCKED' };
    } catch (redisErr) {
        // Redis down → fall through to PostgreSQL-only path
        logger.warn('Redis unavailable, falling back to PG-only locking');
        metrics.increment('redis_fallback');
    }

    // PostgreSQL optimistic lock always runs regardless
    const result = await db.query(`
        UPDATE seats SET status='reserved', version=version+1, ...
        WHERE id=$1 AND status='available' AND version=$2
    `, [seatId, version]);

    return result.rowCount > 0
        ? { success: true }
        : { error: 'SEAT_TAKEN' };
}

Scenario 2: Payment Service Times Out

// Payment with retry and idempotency
async function chargeWithRetry(bookingId, amount, maxRetries = 3) {
    for (let attempt = 1; attempt <= maxRetries; attempt++) {
        try {
            const charge = await stripe.paymentIntents.create({
                amount,
                currency: 'usd',
                idempotency_key: `${bookingId}_v1`,  // same key = same charge
            });
            return charge;
        } catch (err) {
            if (err.type === 'StripeConnectionError' && attempt < maxRetries) {
                await sleep(1000 * attempt); // exponential backoff
                continue;
            }
            throw err;
        }
    }
}

// If all retries fail: extend reservation by 5 minutes, notify user
async function handlePaymentTimeout(bookingId) {
    const booking = await getBooking(bookingId);

    // Extend reservation TTL
    for (const seatId of booking.seatIds) {
        await redis.expire(`reservation:${booking.eventId}:${seatId}`, 300);
        await db.query(`
            UPDATE seats SET reserved_at = now()
            WHERE id = $1 AND reserved_by = $2
        `, [seatId, booking.userId]);
    }

    // Notify user
    await notifyUser(booking.userId, {
        type: 'PAYMENT_RETRY',
        message: 'Payment processing delayed. Your seats are still held.',
        retryUrl: `/checkout/${bookingId}`
    });
}

Scenario 3: Split-Brain (Network Partition)

⚠️ PostgreSQL as Arbiter: During a network partition, Redis replicas might have stale data. We always treat PostgreSQL as the source of truth. If Redis says "available" but PostgreSQL says "reserved," we trust PostgreSQL. The version field in PostgreSQL is the final authority on seat state.

Recap & Key Takeaways

ChallengeSolution
Double bookingOptimistic locking (version CAS) + Redis distributed lock + DB constraints
Seat hoarding10-minute reservation TTL with triple-redundant cleanup
Flash crowd / thundering herdVirtual waiting room with random ordering + batch admission
Real-time seat updatesWebSocket + Redis Pub/Sub broadcasting
Payment consistencyIdempotency keys + atomic DB transactions + reconciliation
Bot preventionCAPTCHA + random queue + Verified Fan + ML scoring
Extreme scale (100K concurrent)Pre-warming + horizontal scaling + graceful degradation
System failureRedis fallback → PG-only, payment retry with idempotency, split-brain arbiter
"The art of ticketing system design is not in handling the happy path — it's in ensuring that under the worst possible conditions (100K users, 1 seat left, payment service flaking), exactly one person gets that seat, and everyone else gets a clear, honest 'sold out' message."