Design: Ticketmaster (Booking System)
Requirements & Scale
Functional Requirements
| Feature | Description |
|---|---|
| Browse Events | Users browse concerts, sports, theater by city, date, genre |
| Search | Full-text search by artist, venue, event name with filters |
| View Seat Map | Interactive venue map showing sections, rows, individual seat availability in real time |
| Reserve Seat | Temporarily hold a seat while user completes checkout (10-minute TTL) |
| Pay & Confirm | Secure payment processing, confirmation email, digital ticket generation |
| Cancellation | Cancel booking, trigger refund, release seat back to inventory |
Non-Functional Requirements
| Requirement | Target |
|---|---|
| Concurrent users per popular event | 100,000+ at on-sale moment |
| Seat reservation latency | < 200ms p99 |
| Double-booking rate | 0% (absolute correctness) |
| System availability | 99.99% during on-sale windows |
| Throughput | 10,000+ seat reservations/sec peak |
| Payment timeout | 10 minutes TTL, auto-release on expiry |
Back-of-Envelope Estimation
Consider a stadium with 80,000 seats going on sale simultaneously:
- 100K users hit the site in the first 30 seconds
- Each user generates ~5 requests: page load, seat map, select seat, reserve, pay
- Peak:
100,000 × 5 / 30s ≈ 16,700 req/sec - Seat reservation writes: ~
80,000 / 60s ≈ 1,333 writes/sec(best case all seats sold in 1 min) - With retries and contention: expect 5,000–10,000 reservation attempts/sec
Seat Inventory Model
Seat State Machine
Every seat in the system follows a strict state machine with four states:
Database Schema
-- Venue & event metadata (PostgreSQL)
CREATE TABLE venues (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
name VARCHAR(255) NOT NULL,
city VARCHAR(100),
capacity INT NOT NULL,
seat_map JSONB NOT NULL -- section/row/seat layout
);
CREATE TABLE events (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
venue_id UUID REFERENCES venues(id),
name VARCHAR(500) NOT NULL,
artist VARCHAR(255),
category VARCHAR(50), -- concert, sports, theater
event_date TIMESTAMPTZ NOT NULL,
on_sale_date TIMESTAMPTZ NOT NULL,
status VARCHAR(20) DEFAULT 'upcoming',
created_at TIMESTAMPTZ DEFAULT now()
);
CREATE INDEX idx_events_date ON events(event_date);
CREATE INDEX idx_events_category ON events(category);
CREATE INDEX idx_events_onsale ON events(on_sale_date);
-- Seat inventory (PostgreSQL - source of truth)
CREATE TABLE seats (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
event_id UUID REFERENCES events(id),
section VARCHAR(20) NOT NULL,
row VARCHAR(10) NOT NULL,
seat_number INT NOT NULL,
price_tier VARCHAR(20),
price_cents INT NOT NULL,
status VARCHAR(20) DEFAULT 'available',
version INT DEFAULT 0, -- optimistic lock version
reserved_by UUID, -- user_id holding reservation
reserved_at TIMESTAMPTZ, -- reservation timestamp
booked_by UUID,
booked_at TIMESTAMPTZ,
UNIQUE(event_id, section, row, seat_number)
);
CREATE INDEX idx_seats_event_status ON seats(event_id, status);
CREATE INDEX idx_seats_reserved_at ON seats(reserved_at)
WHERE status = 'reserved'; -- partial index for TTL cleanup
-- Booking records (PostgreSQL)
CREATE TABLE bookings (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
user_id UUID NOT NULL,
event_id UUID REFERENCES events(id),
seat_ids UUID[] NOT NULL,
total_cents INT NOT NULL,
payment_intent VARCHAR(255), -- Stripe/payment provider ID
status VARCHAR(20) DEFAULT 'pending',
created_at TIMESTAMPTZ DEFAULT now(),
confirmed_at TIMESTAMPTZ,
cancelled_at TIMESTAMPTZ
);
CREATE INDEX idx_bookings_user ON bookings(user_id);
CREATE INDEX idx_bookings_event ON bookings(event_id);
Why a version Column?
The version column is the heart of our concurrency control. Every time a seat's state changes, the version increments. When two users try to reserve the same seat simultaneously, only one can match the current version — the other's UPDATE affects zero rows and must retry.
Distributed Locking & Double-Booking Prevention
The absolute worst outcome for a ticketing system is a double booking: two people both believe they own the same seat. We employ multiple layers of defense:
Layer 1: Optimistic Locking (Primary Defense)
Optimistic concurrency control lets concurrent requests proceed without blocking, but only one can commit successfully:
-- Reserve a seat (atomic operation)
-- Returns 1 row affected on success, 0 on conflict
UPDATE seats
SET status = 'reserved',
version = version + 1,
reserved_by = :user_id,
reserved_at = now()
WHERE id = :seat_id
AND status = 'available'
AND version = :expected_version;
-- Application checks: if rows_affected == 0 → conflict!
-- User gets: "Sorry, this seat was just taken. Please select another."
This is a compare-and-swap (CAS) operation. The three conditions in the WHERE clause ensure:
id = :seat_id— correct seatstatus = 'available'— seat hasn't been takenversion = :expected_version— no concurrent modification since we last read it
status = 'available' alone isn't sufficient. Consider: User A reserves seat → payment fails → seat released back to "available" → User B reads seat (version=2) → User A retries with stale version=1. Without version check, A's stale write could overwrite B's valid state.
Layer 2: Redis Distributed Lock (Fast Path)
Before hitting PostgreSQL, we acquire a short-lived Redis lock per seat to serialize reservation attempts:
-- Redis lock acquisition (using SET NX EX)
SET seat_lock:{event_id}:{seat_id} {user_id} NX EX 5
-- NX = only set if key doesn't exist (atomic)
-- EX 5 = auto-expire in 5 seconds
-- Returns OK if lock acquired, nil if seat already locked
-- After PostgreSQL update succeeds or fails:
DEL seat_lock:{event_id}:{seat_id} -- release lock
This Redis lock serves as a fast-rejection filter. If 500 users click the same seat simultaneously, only 1 acquires the Redis lock; the other 499 get instant rejection without hitting the database at all.
async function reserveSeat(userId, seatId, eventId, expectedVersion) {
const lockKey = `seat_lock:${eventId}:${seatId}`;
// Layer 2: Redis fast lock
const locked = await redis.set(lockKey, userId, 'NX', 'EX', 5);
if (!locked) {
return { success: false, error: 'SEAT_LOCKED_BY_ANOTHER_USER' };
}
try {
// Layer 1: Optimistic lock in PostgreSQL
const result = await db.query(`
UPDATE seats
SET status = 'reserved', version = version + 1,
reserved_by = $1, reserved_at = now()
WHERE id = $2 AND status = 'available' AND version = $3
`, [userId, seatId, expectedVersion]);
if (result.rowCount === 0) {
return { success: false, error: 'SEAT_ALREADY_TAKEN' };
}
// Set reservation TTL in Redis
await redis.set(
`reservation:${eventId}:${seatId}`,
JSON.stringify({ userId, bookingDeadline: Date.now() + 600000 }),
'EX', 600 // 10 minutes
);
return { success: true, expiresIn: 600 };
} finally {
await redis.del(lockKey); // Always release lock
}
}
Layer 3: Database Constraints (Safety Net)
Even if application logic has bugs, the database schema itself prevents double booking:
-- UNIQUE constraint prevents two bookings for same seat
ALTER TABLE seats ADD CONSTRAINT unique_booking
EXCLUDE USING gist (
id WITH =,
tstzrange(reserved_at, reserved_at + interval '10 minutes') WITH &&
) WHERE (status = 'reserved');
-- Or simpler: application-level check within a transaction
BEGIN;
SELECT status, version FROM seats WHERE id = :seat_id FOR UPDATE;
-- Row-level lock acquired, no other transaction can modify this seat
-- Verify status is still 'available', then update
UPDATE seats SET status = 'reserved', ...;
COMMIT;
Comparison of Locking Strategies
| Strategy | Throughput | Latency | Correctness | Complexity |
|---|---|---|---|---|
| Pessimistic (SELECT FOR UPDATE) | Low — serialized | High — lock waits | ✅ Perfect | Low |
| Optimistic (version CAS) | High — no blocking | Low — fail-fast | ✅ Perfect | Medium |
| Redis + Optimistic | Very High — DB shielded | Very Low | ✅ Perfect | High |
| Queue-based serial | Predictable | Medium — queued | ✅ Perfect | High |
Animation: Seat Reservation Race Condition
🏎️ Seat Reservation Race Condition
Two users try to book Seat A-12 simultaneously. See what happens without locking (double booking!) vs. with optimistic locking (safe).
Reservation with TTL
The 10-Minute Timer
When a user reserves a seat, they get a 10-minute window to complete payment. This prevents seat hoarding while giving legitimate buyers enough time to check out.
POST /api/reservereserved_at = now()reservation:{event}:{seat} EX 6009:59... 9:58... 9:57...TTL Enforcement: Three Mechanisms
We don't rely on a single mechanism for releasing expired reservations — we use three independent approaches for reliability:
Mechanism 1: Redis Keyspace Notifications
// Subscribe to Redis key expiration events
redis.config('SET', 'notify-keyspace-events', 'Ex');
redis.subscribe('__keyevent@0__:expired', (channel, expiredKey) => {
// Key format: reservation:{eventId}:{seatId}
const [_, eventId, seatId] = expiredKey.split(':');
// Release seat in PostgreSQL
await db.query(`
UPDATE seats
SET status = 'available', version = version + 1,
reserved_by = NULL, reserved_at = NULL
WHERE id = $1 AND status = 'reserved'
`, [seatId]);
logger.info(`Seat ${seatId} released after TTL expiry`);
});
Mechanism 2: Scheduled Cleanup Job (Cron)
-- Run every 60 seconds: release seats reserved > 10 minutes ago
UPDATE seats
SET status = 'available',
version = version + 1,
reserved_by = NULL,
reserved_at = NULL
WHERE status = 'reserved'
AND reserved_at < now() - interval '10 minutes'
RETURNING id, event_id;
-- Log released seats for monitoring
-- This catches any seats that Redis notification missed
Mechanism 3: Lazy Check on Read
async function getSeatAvailability(eventId, seatId) {
const seat = await db.query(
'SELECT * FROM seats WHERE id = $1', [seatId]
);
// Lazy TTL check: if reserved but past deadline, treat as available
if (seat.status === 'reserved' &&
Date.now() - seat.reserved_at > 600000) {
await releaseSeat(seatId); // async cleanup
seat.status = 'available';
}
return seat;
}
Payment Timeout Handling — Detailed Flow
async function processPayment(bookingId, paymentDetails) {
const booking = await getBooking(bookingId);
const seats = await getSeats(booking.seatIds);
// Verify all seats still reserved by this user
for (const seat of seats) {
if (seat.status !== 'reserved' || seat.reserved_by !== booking.userId) {
throw new Error('RESERVATION_EXPIRED');
}
// Double-check TTL
const elapsed = Date.now() - new Date(seat.reserved_at).getTime();
if (elapsed > 600000) {
throw new Error('RESERVATION_EXPIRED');
}
}
// Begin atomic payment + booking confirmation
const tx = await db.beginTransaction();
try {
// Charge payment (idempotency key = bookingId)
const charge = await stripe.paymentIntents.create({
amount: booking.totalCents,
currency: 'usd',
idempotency_key: bookingId,
metadata: { bookingId, eventId: booking.eventId }
});
if (charge.status !== 'succeeded') {
throw new Error('PAYMENT_FAILED');
}
// Confirm all seats atomically
await tx.query(`
UPDATE seats
SET status = 'booked', version = version + 1,
booked_by = $1, booked_at = now()
WHERE id = ANY($2) AND status = 'reserved'
AND reserved_by = $1
`, [booking.userId, booking.seatIds]);
// Confirm booking
await tx.query(`
UPDATE bookings
SET status = 'confirmed', payment_intent = $1,
confirmed_at = now()
WHERE id = $2
`, [charge.id, bookingId]);
await tx.commit();
// Clean up Redis reservation keys
for (const seatId of booking.seatIds) {
await redis.del(`reservation:${booking.eventId}:${seatId}`);
}
// Send confirmation email & generate ticket
await notificationService.sendConfirmation(booking);
await ticketService.generateDigitalTicket(booking);
return { success: true, charge };
} catch (err) {
await tx.rollback();
// If payment succeeded but DB failed, refund immediately
if (err.message !== 'PAYMENT_FAILED' && charge?.id) {
await stripe.refunds.create({ payment_intent: charge.id });
}
// Release seats
await releaseSeats(booking.seatIds);
throw err;
}
}
Virtual Waiting Room
When 100K users arrive simultaneously for a popular event, even our optimized booking system would buckle. The solution: a virtual waiting room that queues users before they can access the booking page.
How It Works
Queue Implementation
// Queue Service (Redis Sorted Set)
class VirtualQueueService {
constructor(redis) { this.redis = redis; }
// Enqueue user with random score (fair lottery)
async enqueue(eventId, userId) {
const queueKey = `queue:${eventId}`;
const score = Math.random(); // random position, not timestamp
await this.redis.zadd(queueKey, score, userId);
const position = await this.redis.zrank(queueKey, userId);
const total = await this.redis.zcard(queueKey);
return {
token: generateQueueToken(eventId, userId),
position: position + 1,
totalInQueue: total,
estimatedWait: Math.ceil((position + 1) / 1000) * 30 // ~30s per batch
};
}
// Admit next batch of users
async admitBatch(eventId, batchSize = 1000) {
const queueKey = `queue:${eventId}`;
const activeKey = `active:${eventId}`;
// Check current active session count
const activeCount = await this.redis.scard(activeKey);
const maxActive = 5000;
if (activeCount >= maxActive) {
return { admitted: 0, reason: 'MAX_ACTIVE_REACHED' };
}
const slotsAvailable = Math.min(batchSize, maxActive - activeCount);
// Pop lowest-scored users (random order = fair)
const users = await this.redis.zpopmin(queueKey, slotsAvailable);
const admittedUsers = [];
for (let i = 0; i < users.length; i += 2) {
const userId = users[i];
// Add to active set with 15-min expiry
await this.redis.sadd(activeKey, userId);
// Generate booking JWT
const bookingToken = jwt.sign(
{ userId, eventId, type: 'booking_access' },
SECRET, { expiresIn: '15m' }
);
admittedUsers.push({ userId, bookingToken });
}
return { admitted: admittedUsers.length, users: admittedUsers };
}
// Check user's position in queue
async getPosition(eventId, userId) {
const queueKey = `queue:${eventId}`;
const rank = await this.redis.zrank(queueKey, userId);
if (rank === null) return { inQueue: false };
return {
inQueue: true,
position: rank + 1,
total: await this.redis.zcard(queueKey),
estimatedWait: Math.ceil((rank + 1) / 1000) * 30
};
}
}
// Batch admission scheduler (runs every 30 seconds during on-sale)
setInterval(async () => {
const result = await queueService.admitBatch(eventId, 1000);
metrics.gauge('queue.admitted_batch', result.admitted);
metrics.gauge('queue.remaining', await redis.zcard(`queue:${eventId}`));
}, 30000);
Why Random Order, Not FIFO?
Waiting Room UI (Client-Side Polling)
// Client polls every 5 seconds for queue status
async function pollQueueStatus() {
const response = await fetch(`/api/queue/status?token=${queueToken}`);
const data = await response.json();
if (data.admitted) {
// User admitted! Redirect to booking page with JWT
window.location.href =
`/event/${eventId}/book?token=${data.bookingToken}`;
return;
}
// Update waiting room UI
document.getElementById('position').textContent =
`Position: ${data.position} of ${data.total}`;
document.getElementById('eta').textContent =
`Estimated wait: ${data.estimatedWait}s`;
// Show fun animation to keep users engaged
updateProgressAnimation(data.position, data.total);
// Poll again
setTimeout(pollQueueStatus, 5000);
}
Animation: Virtual Waiting Room
🚪 Virtual Waiting Room — Batch Admission
100K users arrive at once → placed in virtual queue → admitted in batches of 1,000 → booking system handles manageable load.
System Architecture
Service Overview
🎭 Event Service
CRUD for events and venues. Serves event catalog, schedules, and metadata. Read-heavy, cached aggressively with CDN + Redis.
🔍 Search Service
Elasticsearch-backed full-text search. Filters by date, city, genre, artist. Near-real-time index updates via CDC from PostgreSQL.
💺 Inventory Service
Core seat management. Owns the seat state machine. Handles reservation, release, and availability queries. Redis + PostgreSQL dual layer.
📋 Booking Service
Orchestrates the booking flow: validate → reserve → pay → confirm. Manages booking records and coordinates between Inventory and Payment.
💳 Payment Service
Integrates with Stripe/PayPal. Handles charges, refunds, webhooks. Idempotent operations with deduplication keys.
🚪 Queue Service
Virtual waiting room for high-demand events. Redis sorted sets for O(log N) operations. JWT token generation for admitted users.
📧 Notification Service
Sends confirmation emails, digital tickets, and queue status updates. Async via message queue (Kafka/SQS).
📊 Analytics Service
Tracks conversion funnels, popular events, seat demand heatmaps. Feeds into pricing and capacity planning.
Architecture Diagram
┌──────────────────┐
│ CDN / CloudFront│
└────────┬─────────┘
│
┌────────▼─────────┐
│ Load Balancer │
│ (ALB / Nginx) │
└────────┬─────────┘
│
┌──────────────────┼──────────────────┐
│ │ │
┌────────▼──────┐ ┌───────▼────────┐ ┌──────▼───────┐
│ API Gateway │ │ Queue Service │ │ WebSocket │
│ (Rate Limit) │ │ (Waiting Room) │ │ Server │
└───────┬───────┘ └───────┬────────┘ │ (seat updates)│
│ │ └──────┬───────┘
┌────────┼────────┐ │ │
│ │ │ │ │
┌───▼──┐ ┌──▼───┐ ┌──▼────┐ │ ┌──────▼───────┐
│Event │ │Search│ │Booking│◄──┘ │ Redis Pub/Sub│
│Svc │ │Svc │ │Svc │ └──────┬───────┘
└──┬───┘ └──┬───┘ └──┬────┘ │
│ │ │ │
│ ┌────▼────┐ │ ┌──────────┐ ┌──────▼───────┐
│ │Elastic- │ ├───►│Payment │ │ Redis │
│ │search │ │ │Service │ │ (Locks,TTL, │
│ └─────────┘ │ └────┬─────┘ │ Inventory) │
│ │ │ └──────┬───────┘
│ ┌──────▼─────┐ │ │
│ │ Inventory │◄──┼─────────────────┘
│ │ Service │ │
│ └──────┬─────┘ │
│ │ │
└──────────┬───────┴─────────┘
│
┌────────▼─────────┐ ┌─────────────┐
│ PostgreSQL │ │ Kafka │
│ (Source of Truth)│────────►│ (Events) │
└──────────────────┘ └──────┬──────┘
│
┌──────▼──────┐
│ Notification │
│ Service │
└─────────────┘
Data Flow for Seat Reservation
sequenceDiagram:
User → API Gateway : POST /api/reserve {seatId, eventId}
API Gateway → Queue Svc: Check booking token validity
Queue Svc → API Gateway: ✅ Token valid, user admitted
API Gateway → Booking : Forward reservation request
Booking → Redis : SET seat_lock:{id} NX EX 5
Redis → Booking : OK (lock acquired)
Booking → Redis : GET seat:{eventId}:{seatId}
Redis → Booking : {status: "available", version: 7}
Booking → Inventory : Reserve seat (version=7)
Inventory → PostgreSQL : UPDATE seats SET status='reserved'
WHERE version=7 AND status='available'
PostgreSQL → Inventory : 1 row affected ✅
Inventory → Redis : SET seat status=reserved, SET TTL 600s
Inventory → Booking : Reserved, expires in 600s
Booking → Redis : DEL seat_lock:{id}
Booking → User : {reserved: true, expiresIn: 600}
Booking → WebSocket : Broadcast: seat A-12 now reserved
Database Design Deep Dive
PostgreSQL — Source of Truth
| Table | Purpose | Reads/sec | Writes/sec |
|---|---|---|---|
events | Event catalog | High (cached) | Low (admin only) |
venues | Venue & seat map layout | Medium | Rare |
seats | Per-seat inventory & status | Very High | Very High (during sale) |
bookings | Confirmed booking records | Low | High (during sale) |
Partitioning Strategy — The seats table is partitioned by event_id:
-- Partition seats by event for isolation and performance
CREATE TABLE seats (
id UUID, event_id UUID, section VARCHAR(20),
row VARCHAR(10), seat_number INT,
status VARCHAR(20), version INT, ...
) PARTITION BY HASH (event_id);
-- Create 16 partitions (adjust based on concurrent events)
CREATE TABLE seats_p0 PARTITION OF seats FOR VALUES WITH (MODULUS 16, REMAINDER 0);
CREATE TABLE seats_p1 PARTITION OF seats FOR VALUES WITH (MODULUS 16, REMAINDER 1);
-- ... seats_p2 through seats_p15
-- Benefits:
-- 1. Hot event queries only scan 1 partition (1/16th of data)
-- 2. Vacuum runs independently per partition
-- 3. Can move cold event data to cheaper storage
Redis — Fast Cache & Lock Layer
// Redis data structures per event
// 1. Seat availability bitmap (O(1) per seat)
SETBIT seat_avail:{eventId} {seatIndex} 1 // available
SETBIT seat_avail:{eventId} {seatIndex} 0 // taken
BITCOUNT seat_avail:{eventId} // count available
// 2. Seat detail hash
HSET seat:{eventId}:{seatId} status "available" version 7 price 15000
// 3. Reservation TTL keys (auto-expire)
SET reservation:{eventId}:{seatId} {userId} EX 600
// 4. Distributed locks (short-lived)
SET seat_lock:{eventId}:{seatId} {userId} NX EX 5
// 5. Event-level counters
INCR event_views:{eventId}
INCR event_reservations:{eventId}
DECR event_available:{eventId}
// 6. Queue sorted set
ZADD queue:{eventId} {randomScore} {userId}
ZPOPMIN queue:{eventId} 1000
// 7. Active session set
SADD active:{eventId} {userId}
SCARD active:{eventId}
Elasticsearch — Search Index
// Event search index mapping
PUT /events
{
"mappings": {
"properties": {
"name": { "type": "text", "analyzer": "standard" },
"artist": { "type": "text", "fields": { "keyword": { "type": "keyword" }}},
"category": { "type": "keyword" },
"city": { "type": "keyword" },
"venue": { "type": "text" },
"date": { "type": "date" },
"on_sale": { "type": "date" },
"price_range": { "type": "integer_range" },
"available_seats": { "type": "integer" },
"location": { "type": "geo_point" },
"tags": { "type": "keyword" }
}
}
}
// Example search: "Taylor Swift concerts in NYC this summer"
GET /events/_search
{
"query": {
"bool": {
"must": [
{ "match": { "artist": "Taylor Swift" }},
{ "term": { "city": "New York" }},
{ "range": { "date": { "gte": "2026-06-01", "lte": "2026-08-31" }}}
],
"filter": [
{ "range": { "available_seats": { "gt": 0 }}}
]
}
},
"sort": [{ "date": "asc" }]
}
Handling Flash Sales
A "flash sale" is the most extreme scenario: an event announced to go on sale at a specific time, with demand massively exceeding supply. Here's our multi-layered defense:
1. Pre-Warming
// 5 minutes before on-sale time:
async function prewarmEvent(eventId) {
// Load all seat data into Redis
const seats = await db.query(
'SELECT * FROM seats WHERE event_id = $1', [eventId]
);
const pipeline = redis.pipeline();
for (const seat of seats) {
pipeline.hset(
`seat:${eventId}:${seat.id}`,
'status', seat.status,
'version', seat.version,
'price', seat.price_cents,
'section', seat.section,
'row', seat.row,
'number', seat.seat_number
);
// Set availability bitmap
const seatIndex = computeSeatIndex(seat);
pipeline.setbit(`seat_avail:${eventId}`, seatIndex, 1);
}
await pipeline.exec();
// Pre-scale infrastructure
await k8s.scaleDeployment('booking-service', { replicas: 20 });
await k8s.scaleDeployment('inventory-service', { replicas: 15 });
// Warm database connection pools
await db.warmPool(50);
logger.info(`Pre-warmed event ${eventId}: ${seats.length} seats cached`);
}
2. Rate Limiting (Per-User and Global)
// API Gateway rate limiting configuration
const rateLimits = {
// Per-user limits (by JWT / IP)
'POST /api/reserve': {
window: '10s',
max: 3, // max 3 reservation attempts per 10 seconds
strategy: 'sliding_window'
},
'GET /api/seats': {
window: '1s',
max: 10, // max 10 seat map refreshes per second
strategy: 'token_bucket'
},
// Global limits (protect backends)
'global:reservation': {
window: '1s',
max: 5000, // max 5000 reservations/sec system-wide
strategy: 'fixed_window',
overflow: 'queue' // excess goes to waiting room
}
};
// Redis-based sliding window rate limiter
async function checkRateLimit(key, window, max) {
const now = Date.now();
const windowStart = now - (window * 1000);
const pipeline = redis.pipeline();
pipeline.zremrangebyscore(key, 0, windowStart); // cleanup old
pipeline.zadd(key, now, `${now}:${Math.random()}`);
pipeline.zcard(key);
pipeline.expire(key, window);
const results = await pipeline.exec();
const count = results[2][1];
return count <= max;
}
3. Horizontal Scaling Strategy
| Component | Normal | Flash Sale | Scale Factor |
|---|---|---|---|
| API Gateway | 4 pods | 20 pods | 5x |
| Booking Service | 4 pods | 20 pods | 5x |
| Inventory Service | 3 pods | 15 pods | 5x |
| Queue Service | 2 pods | 8 pods | 4x |
| Redis Cluster | 3 shards | 6 shards | 2x |
| PostgreSQL read replicas | 2 | 5 | 2.5x |
4. Graceful Degradation
// Circuit breaker configuration per service
const circuitBreakers = {
'payment-service': {
failureThreshold: 5, // 5 failures in window
resetTimeout: 30000, // try again after 30s
fallback: async (req) => {
// Queue payment for retry, keep reservation alive
await paymentQueue.push(req);
return { status: 'PAYMENT_QUEUED', retryIn: 30 };
}
},
'notification-service': {
failureThreshold: 10,
resetTimeout: 60000,
fallback: async (req) => {
// Booking still valid, notification sent later
await notificationQueue.push(req);
return { status: 'NOTIFICATION_DELAYED' };
}
}
};
// Seat map degradation: if real-time WebSocket overloaded,
// fall back to polling every 5 seconds
if (wsConnectionCount > 50000) {
return { mode: 'polling', interval: 5000 };
}
Real-Time Seat Map Updates
WebSocket Architecture
Users viewing the seat map need to see seats disappear in real time as others book them. We use WebSockets backed by Redis Pub/Sub:
// Server: Publish seat status changes
async function onSeatStatusChange(eventId, seatId, newStatus) {
await redis.publish(`seat_updates:${eventId}`, JSON.stringify({
seatId,
status: newStatus,
timestamp: Date.now()
}));
}
// WebSocket server: Subscribe and broadcast
const wss = new WebSocket.Server({ port: 8080 });
wss.on('connection', (ws, req) => {
const eventId = parseEventId(req.url);
const subscriber = redis.duplicate();
subscriber.subscribe(`seat_updates:${eventId}`);
subscriber.on('message', (channel, message) => {
if (ws.readyState === WebSocket.OPEN) {
ws.send(message);
}
});
ws.on('close', () => {
subscriber.unsubscribe();
subscriber.quit();
});
});
// Client: Update seat map in real time
const ws = new WebSocket(`wss://api.example.com/seats/${eventId}`);
ws.onmessage = (event) => {
const update = JSON.parse(event.data);
const seatEl = document.querySelector(`[data-seat="${update.seatId}"]`);
if (update.status === 'reserved' || update.status === 'booked') {
seatEl.classList.remove('available');
seatEl.classList.add('taken');
seatEl.setAttribute('disabled', 'true');
} else if (update.status === 'available') {
seatEl.classList.remove('taken');
seatEl.classList.add('available');
seatEl.removeAttribute('disabled');
}
};
Anti-Fraud & Bot Prevention
| Defense Layer | Technique | Blocks |
|---|---|---|
| Network | IP rate limiting, geo-fencing, ASN reputation | Basic bots, data-center traffic |
| Application | CAPTCHA at queue entry, browser fingerprinting | Automated scripts |
| Queue | Random ordering (not FIFO), Verified Fan registration | Speed advantage bots |
| Booking | 1 booking per user per event, phone verification | Scalper bulk buying |
| Payment | Card velocity checks, address verification | Stolen cards, fraud rings |
| Post-Sale | Transfer restrictions, identity-linked tickets | Secondary market scalping |
// Bot detection middleware
async function botDetection(req, res, next) {
const signals = {
// Timing analysis
timeToInteract: req.headers['x-time-to-interact'],
// Browser fingerprint
fingerprint: req.headers['x-fp'],
// Mouse movement entropy (sent by client JS)
mouseEntropy: req.body?.mouseEntropy,
// Request patterns
requestInterval: getRequestInterval(req.ip),
};
const score = await mlModel.predict(signals);
if (score > 0.9) {
// Definitely a bot → block
return res.status(429).json({ error: 'Suspicious activity detected' });
} else if (score > 0.7) {
// Possibly a bot → challenge with CAPTCHA
return res.status(403).json({ challenge: 'captcha', token: generateCaptchaToken() });
}
// Likely human → allow
next();
}
Monitoring & Observability
Key Metrics Dashboard
| Metric | Alert Threshold | Why It Matters |
|---|---|---|
| Reservation p99 latency | > 500ms | Users abandon if seat selection feels slow |
| Double-booking count | > 0 | Absolute correctness requirement — page the team |
| Reservation timeout rate | > 30% | TTL too short or payment flow too complex |
| Queue wait time p95 | > 10 min | Users leave if wait feels hopeless |
| Payment failure rate | > 5% | Provider issue or card decline spike |
| Redis lock contention | > 80% rejection | Need more seat inventory or better distribution |
| Active WebSocket connections | > 90% capacity | Scale up WebSocket servers |
// Distributed tracing for the booking flow
// Every request gets a trace ID that follows it through all services
const trace = {
traceId: 'abc-123-def-456',
spans: [
{ service: 'api-gateway', duration: '5ms', status: 'ok' },
{ service: 'queue-service', duration: '2ms', status: 'ok' },
{ service: 'booking-service', duration: '12ms', status: 'ok' },
{ service: 'redis-lock', duration: '1ms', status: 'ok' },
{ service: 'inventory-service', duration: '8ms', status: 'ok' },
{ service: 'postgresql', duration: '15ms', status: 'ok' },
{ service: 'redis-cache', duration: '1ms', status: 'ok' },
// Total: ~44ms end-to-end for reservation
]
};
Failure Scenarios & Recovery
Scenario 1: Redis Goes Down
// Fallback: bypass Redis lock, rely on PostgreSQL optimistic lock only
async function reserveSeatWithFallback(userId, seatId, version) {
try {
// Try Redis lock first
const locked = await redis.set(lockKey, userId, 'NX', 'EX', 5);
if (!locked) return { error: 'SEAT_LOCKED' };
} catch (redisErr) {
// Redis down → fall through to PostgreSQL-only path
logger.warn('Redis unavailable, falling back to PG-only locking');
metrics.increment('redis_fallback');
}
// PostgreSQL optimistic lock always runs regardless
const result = await db.query(`
UPDATE seats SET status='reserved', version=version+1, ...
WHERE id=$1 AND status='available' AND version=$2
`, [seatId, version]);
return result.rowCount > 0
? { success: true }
: { error: 'SEAT_TAKEN' };
}
Scenario 2: Payment Service Times Out
// Payment with retry and idempotency
async function chargeWithRetry(bookingId, amount, maxRetries = 3) {
for (let attempt = 1; attempt <= maxRetries; attempt++) {
try {
const charge = await stripe.paymentIntents.create({
amount,
currency: 'usd',
idempotency_key: `${bookingId}_v1`, // same key = same charge
});
return charge;
} catch (err) {
if (err.type === 'StripeConnectionError' && attempt < maxRetries) {
await sleep(1000 * attempt); // exponential backoff
continue;
}
throw err;
}
}
}
// If all retries fail: extend reservation by 5 minutes, notify user
async function handlePaymentTimeout(bookingId) {
const booking = await getBooking(bookingId);
// Extend reservation TTL
for (const seatId of booking.seatIds) {
await redis.expire(`reservation:${booking.eventId}:${seatId}`, 300);
await db.query(`
UPDATE seats SET reserved_at = now()
WHERE id = $1 AND reserved_by = $2
`, [seatId, booking.userId]);
}
// Notify user
await notifyUser(booking.userId, {
type: 'PAYMENT_RETRY',
message: 'Payment processing delayed. Your seats are still held.',
retryUrl: `/checkout/${bookingId}`
});
}
Scenario 3: Split-Brain (Network Partition)
version field in PostgreSQL is the final authority on seat state.
Recap & Key Takeaways
| Challenge | Solution |
|---|---|
| Double booking | Optimistic locking (version CAS) + Redis distributed lock + DB constraints |
| Seat hoarding | 10-minute reservation TTL with triple-redundant cleanup |
| Flash crowd / thundering herd | Virtual waiting room with random ordering + batch admission |
| Real-time seat updates | WebSocket + Redis Pub/Sub broadcasting |
| Payment consistency | Idempotency keys + atomic DB transactions + reconciliation |
| Bot prevention | CAPTCHA + random queue + Verified Fan + ML scoring |
| Extreme scale (100K concurrent) | Pre-warming + horizontal scaling + graceful degradation |
| System failure | Redis fallback → PG-only, payment retry with idempotency, split-brain arbiter |