Design: Uber / Ride-Sharing
Uber connects millions of riders with drivers in real time. Behind a deceptively simple "request a ride" button lies one of the most complex distributed systems ever built: a service that must ingest millions of GPS pings per second, match riders with optimal drivers in under two seconds, compute accurate ETAs over live road graphs, adjust pricing dynamically across thousands of city zones, and orchestrate payments — all while handling 20 million rides per day across 70+ countries.
In this post we'll design the complete ride-sharing system from the ground up: the location ingestion pipeline, the matching algorithm, the trip lifecycle state machine, the H3 hexagonal grid for surge pricing, ETA computation with ML, and the payment authorization flow. We'll look at how Uber's real DISCO dispatch system works and the engineering trade-offs behind every decision.
Requirements
Functional Requirements
- Request a ride: Rider opens app, enters pickup & drop-off, selects ride type (UberX, Pool, Black), and requests
- Match with nearest driver: System finds optimal available driver and offers the trip
- Real-time tracking: Both rider and driver see each other's live location on the map
- ETA computation: Accurate estimated time of arrival before and during the trip
- Surge pricing: Dynamic price multiplier based on real-time supply/demand per geographic zone
- Payment processing: Charge rider and pay driver at trip completion
- Rating system: Mutual rating after ride completion
- Trip history: Riders and drivers can view past trips
Non-Functional Requirements
- Scale: 20 million rides/day, ~500 million location updates/day from drivers
- Latency: Match rider to driver in <2 seconds; location updates reflected in <1 second
- Availability: 99.99% uptime — every minute of downtime means thousands of stranded riders
- Consistency: A ride must never be double-assigned to two drivers (strong consistency on matching)
- Durability: No trip or payment data may be lost
Scale Estimation
| Metric | Value | Derived Rate |
|---|---|---|
| Rides per day | 20 million | ~230 rides/sec |
| Active drivers online | ~5 million peak | — |
| Location updates per driver | Every 3-4 seconds | — |
| Location updates (total) | ~500M/day | ~1.3M writes/sec peak |
| Average trip duration | 15 minutes | ~3.5M concurrent trips |
| WebSocket connections | ~10 million concurrent | riders + drivers |
High-Level Architecture
The system decomposes into these core services:
| Service | Responsibility | Key Technology |
|---|---|---|
| API Gateway | Authentication, rate limiting, routing | NGINX / Envoy |
| Location Service | Ingest & store real-time driver positions | Redis + Geospatial Index |
| Matching Service (DISCO) | Find optimal driver for each ride request | Geohash + Custom Algorithm |
| Trip Service | Manage ride lifecycle state machine | PostgreSQL + Kafka |
| ETA Service | Compute ETAs using road graph + ML | Graph DB + TensorFlow |
| Pricing Service | Compute fare + surge multiplier | H3 Grid + Redis |
| Payment Service | Authorize, capture, and settle payments | Stripe / Internal Ledger |
| Notification Service | Push notifications & WebSocket updates | WebSocket + APNS/FCM |
| User Service | Rider/driver profiles, ratings, preferences | PostgreSQL |
┌─────────────┐ ┌─────────────┐
│ Rider App │ │ Driver App │
└──────┬──────┘ └──────┬──────┘
│ HTTPS/WSS │ HTTPS/WSS
▼ ▼
┌─────────────────────────────────┐
│ API Gateway │
│ (Auth, Rate Limit, Route) │
└──┬────┬────┬────┬────┬────┬─────┘
│ │ │ │ │ │
▼ ▼ ▼ ▼ ▼ ▼
┌────┐┌────┐┌────┐┌────┐┌────┐┌────┐
│Loc.││Trip││DISC││ETA ││Pric││Pay │
│Svc ││Svc ││ O ││Svc ││Svc ││Svc │
└─┬──┘└─┬──┘└─┬──┘└─┬──┘└─┬──┘└─┬──┘
│ │ │ │ │ │
▼ ▼ ▼ ▼ ▼ ▼
┌─────────────────────────────────┐
│ Redis Cluster (Locations) │
│ PostgreSQL (Trips, Users) │
│ Kafka (Events) │
│ S3 (Trip Logs) │
└─────────────────────────────────┘
Rider & Driver Flows
Rider Flow
The rider's journey through the system:
Open App→ Enter Pickup & Destination→ See Fare Estimate + Surge→ Request Ride→ Wait for Match→ Track Driver Approach→ Driver Arrives→ In-Trip Tracking→ Arrive at Destination→ Pay & Rate
POST /v1/rides/request
{
"rider_id": "usr_abc123",
"pickup": { "lat": 37.7749, "lng": -122.4194 },
"dropoff": { "lat": 37.3382, "lng": -121.8863 },
"ride_type": "UberX",
"payment_method_id": "pm_visa_4242"
}
Returns a ride_id and initiates the matching process. The rider receives real-time updates via WebSocket.
Driver Flow
Go Online→ Send Location Every 3-4s→ Receive Ride Request→ Accept / Decline (15s timeout)→ Navigate to Pickup→ Confirm Rider Pickup→ Navigate to Destination→ Complete Trip→ Get Paid
POST /v1/drivers/location
{
"driver_id": "drv_xyz789",
"lat": 37.7751,
"lng": -122.4180,
"heading": 45,
"speed_kmh": 32,
"timestamp": 1714500000000,
"status": "available"
}
Sent every 3-4 seconds by the driver app. This is the highest-throughput endpoint in the entire system (~1.3M writes/sec peak).
Location Service
The location service is the beating heart of Uber. It must ingest, store, and serve millions of GPS coordinates per second with sub-second latency.
Why Redis with Geospatial Index?
Redis provides the GEOADD / GEOSEARCH commands backed by a sorted set where each member's score is a 52-bit geohash. This gives us:
- O(log N) insert for each location update
- O(N + log M) radius query where N is results returned and M is total members
- In-memory speed: sub-millisecond latency for both writes and reads
# Driver sends location update → Location Service writes to Redis:
GEOADD drivers:available -122.4180 37.7751 "drv_xyz789"
# When matching needs nearby drivers:
GEOSEARCH drivers:available
FROMLONLAT -122.4194 37.7749
BYRADIUS 5 km
ASC
COUNT 20
WITHCOORD WITHDIST
# Returns:
# 1) "drv_xyz789" — 0.15 km — (37.7751, -122.4180)
# 2) "drv_abc456" — 0.82 km — (37.7780, -122.4210)
# 3) "drv_def012" — 1.45 km — (37.7690, -122.4100)
# ...
How Geohash Works
A geohash encodes a (latitude, longitude) pair into a single integer by interleaving the bits of the binary representations of latitude and longitude. Points that are geographically close share a common prefix in their geohash.
Geohash encoding for (37.7749, -122.4194):
Step 1: Encode latitude 37.7749 into binary
Range [-90, 90] → midpoint 0 → 37.7749 > 0 → bit 1
Range [0, 90] → midpoint 45 → 37.7749 < 45 → bit 0
Range [0, 45] → midpoint 22.5 → 37.7749 > 22.5 → bit 1
... continue for desired precision
Step 2: Encode longitude -122.4194 into binary
Range [-180, 180] → midpoint 0 → -122.4194 < 0 → bit 0
Range [-180, 0] → midpoint -90 → -122.4194 < -90 → bit 0
Range [-180, -90] → midpoint -135 → -122.4194 > -135 → bit 1
... continue
Step 3: Interleave bits (lon, lat, lon, lat, ...)
= 0, 1, 0, 0, 1, 1, ... → "9q8yyz" (base32)
Location Ingestion Pipeline
With 1.3 million location writes per second at peak, we need a robust ingestion pipeline:
Driver App
│
▼ (every 3-4 seconds)
┌──────────────┐
│ API Gateway │ ← rate limit per driver: 1 update/sec
└──────┬───────┘
▼
┌──────────────┐
│ Kafka Topic │ ← "driver-locations" — partitioned by driver_id
│ (Buffer) │ ~100 partitions, 3 replicas
└──────┬───────┘
▼
┌──────────────┐
│ Location │ ← Kafka consumers (100+ instances)
│ Consumers │ Batch writes to Redis
└──────┬───────┘
▼
┌──────────────┐
│ Redis Cluster │ ← 50+ shards, each handling ~26K writes/sec
│ (Geospatial) │ Shard key: geohash prefix (colocate nearby drivers)
└──────────────┘
Multi-Region Sharding
Uber shards the location store by city or region. A ride request in San Francisco only queries the SF Redis shard — it never needs to scan drivers in New York. This partitioning is natural for a ride-sharing system:
Redis sharding strategy:
Shard key = city_id (e.g., "sf", "nyc", "london")
drivers:sf:available → Redis shard 1-5 (SF drivers)
drivers:nyc:available → Redis shard 6-10 (NYC drivers)
drivers:london:available → Redis shard 11-15 (London drivers)
Benefits:
✓ Queries never cross city boundaries
✓ Hot cities get more shards
✓ Independent scaling per region
✓ Failure isolation — NYC outage doesn't affect SF
Matching Algorithm & DISCO
The matching service — internally called DISCO (Dispatch Optimization) at Uber — is the brain of the system. It takes a ride request and finds the best available driver. This is not simply "find the nearest driver." DISCO optimizes for the global system: minimizing total wait time across all riders, not just the current one.
Basic Matching: Radius Search + Ranking
The foundational algorithm works in stages:
function matchDriver(rideRequest):
pickup = rideRequest.pickup // (lat, lng)
rideType = rideRequest.rideType // "UberX", "Pool", etc.
// Stage 1: Find candidate drivers within radius
radius = 5.0 // km, start with 5km
candidates = redis.GEOSEARCH(
key = "drivers:{city}:available",
center = pickup,
radius = radius,
count = 20,
sort = "ASC" // nearest first
)
// Stage 2: Filter by eligibility
candidates = candidates.filter(driver =>
driver.rideType.includes(rideType) &&
driver.rating >= MIN_RATING &&
driver.vehicleCapacity >= rideRequest.passengers &&
!driver.hasActiveOffer // not already offered another ride
)
// Stage 3: If too few candidates, expand radius
if candidates.length < 3:
radius = 10.0
candidates = redis.GEOSEARCH(..., radius=10.0, count=50)
candidates = applyFilters(candidates)
// Stage 4: Rank by composite score
for each driver in candidates:
driver.eta = etaService.compute(driver.location, pickup)
driver.score = rankDriver(driver, rideRequest)
// Stage 5: Sort by score, offer to best
candidates.sortBy(d => d.score, DESC)
return offerToDriver(candidates[0], rideRequest)
Ranking Function
The ranking score considers multiple factors:
function rankDriver(driver, request):
// Lower ETA is better (normalize to 0-1)
etaScore = 1.0 - (driver.eta / MAX_ETA)
// Closer distance is better
distScore = 1.0 - (driver.distance / MAX_DISTANCE)
// Driver rating (4.5-5.0 → 0.9-1.0)
ratingScore = driver.rating / 5.0
// Driver acceptance rate (higher is better)
acceptScore = driver.acceptanceRate
// Trip completion heading — is the driver heading toward pickup?
headingScore = cos(angleBetween(driver.heading, bearingTo(driver.loc, request.pickup)))
headingScore = max(0, headingScore) // only reward driving toward
return (
0.40 * etaScore +
0.25 * distScore +
0.15 * ratingScore +
0.10 * acceptScore +
0.10 * headingScore
)
The Offer Cascade
When a driver is selected, they receive a push notification with 15 seconds to accept. If they don't respond or decline:
function offerToDriver(driver, ride):
// Mark driver as "offered" (prevent double-dispatch)
redis.SET("offer:{driver.id}", ride.id, EX=20)
// Send push notification
notify(driver, {
type: "RIDE_OFFER",
ride_id: ride.id,
pickup: ride.pickup,
dropoff: ride.dropoff,
fare_estimate: ride.fare,
timeout: 15 // seconds
})
// Wait for response
response = await withTimeout(15_seconds):
driver.respondToOffer(ride.id)
if response == ACCEPTED:
tripService.createTrip(ride, driver)
notifyRider(ride.rider, "Driver matched!")
return SUCCESS
if response == DECLINED or response == TIMEOUT:
redis.DEL("offer:{driver.id}")
// Move to next candidate
nextDriver = getNextCandidate(ride)
if nextDriver:
return offerToDriver(nextDriver, ride)
else:
notifyRider(ride.rider, "No drivers available")
return NO_MATCH
DISCO: Global Optimization
Uber's real DISCO system goes beyond greedy nearest-driver matching. It uses batched matching — collecting ride requests over a short window (2-3 seconds) and solving the assignment problem globally:
DISCO Batched Matching:
Every 2 seconds:
1. Collect all pending ride requests → R = {r1, r2, ..., rN}
2. Collect all available drivers → D = {d1, d2, ..., dM}
3. Build cost matrix C[i][j] = ETA(driver_j → rider_i)
4. Solve assignment: minimize total ΣC[i][j]
subject to: each driver assigned to at most 1 rider
each rider assigned to exactly 1 driver (if possible)
5. This is the Hungarian Algorithm (or auction algorithm for speed)
Why batching beats greedy:
Rider A at (1,1), Rider B at (2,2)
Driver X at (1,2), Driver Y at (2,1)
Greedy (process A first):
A → X (nearest, dist=1), B → Y (dist=1) → total = 2
But if A arrived 1 second before B:
A → Y (dist=√2 ≈ 1.41), B → X (dist=√2 ≈ 1.41) → total = 2.82
Batched optimal:
A → X (dist=1), B → Y (dist=1) → total = 2 ✓
Batching ensures global optimality regardless of arrival order.
▶ Ride Matching Animation
Watch how the system matches a rider with the optimal driver using radius search, filtering, and ranking.
Trip Service & State Machine
The Trip Service manages the complete lifecycle of every ride through a strict state machine. Each state transition is recorded as an event in Kafka, making the entire trip history auditable and replayable.
Trip State Machine
┌──────────────┐
│ REQUESTED │
└──────┬───────┘
│ driver accepts
▼
┌──────────────┐
timeout │ ACCEPTED │──────────────────┐
────────── └──────┬───────┘ │
│ driver starts driving │
▼ │
┌──────────────┐ │
│ EN_ROUTE │ │
└──────┬───────┘ │
│ driver arrives at pickup │
▼ │
┌──────────────┐ │
rider │ ARRIVED │ 5-min wait │
no-show └──────┬───────┘ timeout ────────┤
──────────────────│ │
│ rider gets in │
▼ │
┌──────────────┐ │
│ IN_PROGRESS │ │
└──────┬───────┘ │
│ arrive at destination │
▼ ▼
┌──────────────┐ ┌──────────────┐
│ COMPLETED │ │ CANCELLED │
└──────────────┘ └──────────────┘
Trip Data Model
CREATE TABLE trips (
id UUID PRIMARY KEY,
rider_id UUID NOT NULL REFERENCES users(id),
driver_id UUID REFERENCES users(id), -- NULL until matched
status VARCHAR(20) NOT NULL DEFAULT 'REQUESTED',
-- Locations
pickup_lat DECIMAL(9,6) NOT NULL,
pickup_lng DECIMAL(9,6) NOT NULL,
dropoff_lat DECIMAL(9,6) NOT NULL,
dropoff_lng DECIMAL(9,6) NOT NULL,
-- Timing
requested_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
accepted_at TIMESTAMPTZ,
pickup_at TIMESTAMPTZ,
dropoff_at TIMESTAMPTZ,
cancelled_at TIMESTAMPTZ,
-- Pricing
ride_type VARCHAR(20) NOT NULL,
base_fare DECIMAL(10,2),
surge_multiplier DECIMAL(4,2) DEFAULT 1.00,
final_fare DECIMAL(10,2),
distance_km DECIMAL(8,2),
duration_min DECIMAL(8,2),
-- Payment
payment_method_id VARCHAR(64),
payment_intent_id VARCHAR(64), -- Stripe hold
payment_status VARCHAR(20) DEFAULT 'PENDING',
-- Ratings
rider_rating SMALLINT CHECK (rider_rating BETWEEN 1 AND 5),
driver_rating SMALLINT CHECK (driver_rating BETWEEN 1 AND 5),
created_at TIMESTAMPTZ DEFAULT NOW(),
updated_at TIMESTAMPTZ DEFAULT NOW()
);
CREATE INDEX idx_trips_rider ON trips(rider_id, created_at DESC);
CREATE INDEX idx_trips_driver ON trips(driver_id, created_at DESC);
CREATE INDEX idx_trips_status ON trips(status) WHERE status NOT IN ('COMPLETED', 'CANCELLED');
State Transition Events
Every state change emits a Kafka event consumed by downstream services:
// Kafka topic: "trip-events"
{
"event_type": "TRIP_ACCEPTED",
"trip_id": "trip_abc123",
"driver_id": "drv_xyz789",
"rider_id": "usr_abc123",
"timestamp": "2026-04-15T10:23:45Z",
"metadata": {
"driver_location": { "lat": 37.7751, "lng": -122.4180 },
"eta_to_pickup": 240 // seconds
}
}
Consumers:
→ Notification Service: push "Driver en route, 4 min away" to rider
→ Analytics Service: log matching latency metric
→ Location Service: start tracking driver→pickup route
→ Payment Service: authorize hold on rider's payment method
ETA Computation
Accurate ETAs are critical — they determine matching decisions, rider expectations, and driver earnings. Uber computes ETAs using a layered approach combining graph algorithms with machine learning.
Layer 1: Road Graph + Dijkstra
The foundation is a weighted directed graph of the road network:
Road graph structure:
Nodes: intersections (~50M globally)
Edges: road segments between intersections
Edge weight: travel time = distance / speed_limit
Shortest path: Dijkstra's algorithm (or A* with haversine heuristic)
Optimization: Contraction Hierarchies (CH)
- Precompute shortcuts between important nodes
- Reduces Dijkstra from O(E log V) to O(k log k) where k << V
- Typical: 50M nodes → query in ~0.5ms with CH
Layer 2: Live Traffic Overlay
Static road speeds are wildly inaccurate during rush hour. Uber overlays live traffic data from its own drivers:
Every road segment has a real-time speed:
segment_speeds = {
"Market_St_1_2": { speed: 15 kmh, updated: "10:23:42" },
"Market_St_2_3": { speed: 8 kmh, updated: "10:23:41" }, // congestion!
"Highway_101_45_46":{ speed: 95 kmh, updated: "10:23:43" },
}
Source: GPS traces from all Uber drivers on the road
Update frequency: every 60 seconds per segment
Storage: Redis (fast reads for routing engine)
ETA with live traffic:
For each edge in the shortest path:
travel_time = segment_distance / live_speed[segment]
total_eta = Σ travel_times + Σ turn_penalties + pickup_overhead
Layer 3: ML-Based Correction
Even with live traffic, graph-based ETAs have systematic errors (traffic lights, school zones, construction). Uber uses a gradient-boosted decision tree model to correct:
Features:
- Graph ETA (from Layer 1+2)
- Time of day, day of week
- Weather (rain → +15% ETA)
- Historical trip times on this route
- Current surge level (proxy for congestion)
- Number of traffic signals on route
- Special events (concerts, sports games)
Model: XGBoost / LightGBM
Input: feature vector → Output: corrected_eta
Training data: millions of completed trips (actual duration vs. predicted)
Result:
Graph ETA alone: ±25% error
Graph + live traffic: ±15% error
Graph + traffic + ML: ±8% error ← production accuracy
Surge Pricing & H3 Hexagonal Grid
Surge pricing is Uber's mechanism to balance supply and demand in real time. When rider demand exceeds driver supply in an area, prices increase to incentivize more drivers to that area and to reduce demand from price-sensitive riders.
Why Hexagons? The H3 Grid System
Uber developed the H3 hierarchical hexagonal grid system to divide the entire planet into hexagonal cells. Why hexagons instead of squares or other shapes?
| Property | Square Grid | Hexagonal Grid (H3) |
|---|---|---|
| Neighbor distance | Unequal (√2 for diagonal) | Equal for all 6 neighbors |
| Edge effects | 8 neighbors, inconsistent adjacency | 6 neighbors, uniform adjacency |
| Coverage | Tessellates perfectly | Approximately tessellates (small gaps) |
| Movement modeling | Diagonal bias | Natural for movement in any direction |
| Visual representation | Familiar | Closer to circles (better for radii) |
H3 Resolution Levels (selected):
┌────────────┬───────────────────────┬────────────────────────────┐
│ Resolution │ Hex Edge Length │ Hex Area │
├────────────┼───────────────────────┼────────────────────────────┤
│ 0 │ ~1107.71 km │ ~4,357,449.42 km² │
│ 4 │ ~22.61 km │ ~1,770.35 km² │
│ 7 │ ~1.22 km │ ~5.16 km² │ ← city zones
│ 8 │ ~0.46 km │ ~0.74 km² │ ← surge pricing
│ 9 │ ~0.17 km │ ~0.105 km² │ ← fine-grained
│ 15 │ ~0.51 m │ ~0.90 m² │
└────────────┴───────────────────────┴────────────────────────────┘
Uber uses Resolution 8 for surge pricing:
Each cell ≈ 0.74 km² (about 6 city blocks)
San Francisco ≈ 160 cells at resolution 8
Perfect granularity for supply/demand zones
H3 Indexing
Every GPS coordinate maps to a unique H3 cell index at each resolution:
import h3
# Convert (lat, lng) to H3 cell at resolution 8
cell = h3.latlng_to_cell(37.7749, -122.4194, 8)
# → '8828308281fffff'
# Get center of a cell
center = h3.cell_to_latlng('8828308281fffff')
# → (37.77488, -122.41941)
# Get all 6 neighboring cells (ring distance 1)
neighbors = h3.grid_ring('8828308281fffff', 1)
# → {'8828308283fffff', '8828308285fffff', ...}
# Get all cells within k rings (disk)
disk = h3.grid_disk('8828308281fffff', 2)
# → 19 cells (center + ring 1 + ring 2)
# Hierarchical: get parent (coarser) and children (finer)
parent = h3.cell_to_parent('8828308281fffff', 7)
children = h3.cell_to_children('8828308281fffff', 9) # 7 children
Supply/Demand Ratio Computation
Every 30-60 seconds, the pricing service computes the surge multiplier per H3 cell:
function computeSurge(city):
cells = h3.getCellsForCity(city, resolution=8)
for each cell in cells:
// Count supply: available drivers in this cell
supply = countDriversInCell(cell, status="available")
// Count demand: ride requests in last 5 minutes
demand = countRecentRequests(cell, window=5_minutes)
// Compute raw ratio
if supply == 0:
ratio = MAX_SURGE // no supply → max surge
else:
ratio = demand / supply
// Map ratio to surge multiplier using a curve
surge = surgeCurve(ratio)
// Smooth with neighbors to avoid sharp boundaries
neighborSurges = getNeighborSurges(cell)
smoothedSurge = 0.6 * surge + 0.4 * avg(neighborSurges)
// Clamp to allowed range
smoothedSurge = clamp(smoothedSurge, 1.0, MAX_SURGE)
redis.SET("surge:{cell}", smoothedSurge, EX=120)
return surgeMap
// Surge curve: piecewise linear
function surgeCurve(ratio):
if ratio <= 1.0: return 1.0 // balanced or oversupply
if ratio <= 1.5: return 1.2 // slight demand increase
if ratio <= 2.0: return 1.5 // moderate surge
if ratio <= 3.0: return 2.0 // high demand
if ratio <= 5.0: return 3.0 // very high demand
return 5.0 // extreme (concerts, NYE, etc.)
Fare Calculation with Surge
function calculateFare(trip):
baseRate = getRateCard(trip.city, trip.rideType)
// e.g., { baseFare: $2.50, perKm: $1.20, perMin: $0.35, bookingFee: $2.00 }
distanceFare = trip.distance_km * baseRate.perKm
timeFare = trip.duration_min * baseRate.perMin
subtotal = baseRate.baseFare + distanceFare + timeFare
// Apply surge
surgeCell = h3.latlng_to_cell(trip.pickup_lat, trip.pickup_lng, 8)
surge = redis.GET("surge:{surgeCell}") || 1.0
surgedFare = subtotal * surge
// Add fees
totalFare = surgedFare + baseRate.bookingFee
// Apply minimum fare
totalFare = max(totalFare, baseRate.minimumFare)
return {
baseFare: subtotal,
surgeMultiplier: surge,
surgeAmount: surgedFare - subtotal,
bookingFee: baseRate.bookingFee,
totalFare: totalFare
}
▶ Surge Pricing — H3 Hexagonal Grid
Watch supply/demand shift across city zones. Green = balanced, Yellow = moderate demand, Red = high demand. Each hex shows its surge multiplier.
Payment Service
The payment flow must handle the unique challenge of ride-sharing: the final amount is unknown when the trip starts. Uber uses an authorize-then-capture pattern.
Payment Flow
Timeline:
─────────────────────────────────────────────────────────────────
│ Request Ride │ Trip Starts │ Trip Ends │ Settle
│ │ │ │
│ 1. Validate │ 3. Trip in │ 4. Calculate │ 6. Pay
│ payment method │ progress │ final fare │ driver
│ │ │ │
│ 2. Auth hold │ │ 5. Capture │ 7. Reconcile
│ (estimated │ │ actual amount │
│ fare + 20%) │ │ from hold │
─────────────────────────────────────────────────────────────────
Step-by-Step Payment Flow
// Step 1: When rider requests a ride
function onRideRequested(ride):
// Validate payment method is active
paymentMethod = paymentService.getMethod(ride.payment_method_id)
if !paymentMethod.isValid():
return error("Invalid payment method")
// Estimate fare (route distance + current surge)
estimate = pricingService.estimateFare(ride)
// e.g., $24.50
// Step 2: Place authorization hold (estimate + 20% buffer)
holdAmount = estimate.totalFare * 1.20 // $29.40
authorization = stripe.paymentIntents.create({
amount: holdAmount,
currency: "usd",
payment_method: paymentMethod.stripeId,
capture_method: "manual", // ← don't charge yet!
metadata: { ride_id: ride.id }
})
ride.payment_intent_id = authorization.id
ride.save()
// Step 5: When trip completes
function onTripCompleted(trip):
// Calculate actual fare
actualFare = pricingService.calculateFare(trip)
// e.g., $22.80 (shorter route than estimated)
// Capture the actual amount from the held funds
stripe.paymentIntents.capture(
trip.payment_intent_id,
{ amount_to_capture: actualFare.totalFare }
)
// The remaining hold ($29.40 - $22.80 = $6.60) is released
trip.final_fare = actualFare.totalFare
trip.payment_status = "CAPTURED"
trip.save()
// Step 6: Pay the driver (weekly settlement)
function weeklyDriverPayout(driver):
trips = getCompletedTrips(driver, thisWeek)
totalEarnings = sum(trip.final_fare * 0.75 for trip in trips)
// Driver gets 75%, Uber keeps 25% commission
uberCommission = sum(trip.final_fare * 0.25 for trip in trips)
stripe.transfers.create({
amount: totalEarnings,
destination: driver.stripeConnectAccount,
metadata: { week: currentWeek, trip_count: trips.length }
})
Edge Cases
| Scenario | Handling |
|---|---|
| Fare exceeds authorization hold | Capture the hold amount, charge the remainder as a separate transaction |
| Rider cancels after match | Capture cancellation fee ($5-10) from hold, release the rest |
| Driver cancels | Release full hold, no charge to rider |
| Trip route deviates significantly | Flag for manual review; use route-based fare, not meter fare |
| Card expired between auth and capture | Auth is still valid (holds survive expiry during hold period) |
| Dispute / chargeback | Automated evidence submission with GPS trail + trip receipt |
| Split fare (Uber feature) | Multiple auth holds on multiple cards; capture proportionally |
Idempotency & Double-Charge Prevention
// Every payment operation uses an idempotency key
function capturePayment(trip):
idempotencyKey = "capture:{trip.id}:{trip.final_fare}"
result = stripe.paymentIntents.capture(
trip.payment_intent_id,
{ amount_to_capture: trip.final_fare },
{ idempotencyKey: idempotencyKey }
)
// If this function is retried (network timeout, crash, etc.),
// Stripe returns the same result without charging again.
return result
DISCO: Dispatch Optimization Deep Dive
Uber's DISCO system is one of the most sophisticated real-time optimization engines in production. Let's look at the advanced features beyond basic matching.
Forward Dispatch
When a driver is completing a trip and is 2-3 minutes from drop-off, DISCO can pre-match them with a new rider near the drop-off location:
function forwardDispatch():
// Find drivers completing trips in the next 3 minutes
completingSoon = tripService.getTripsCompletingIn(minutes=3)
for each trip in completingSoon:
// Predicted drop-off location
dropoff = trip.dropoff
// Find pending ride requests near that drop-off
nearbyRequests = matchingService.getPendingRequests(
near = dropoff,
radius = 2.0 // km
)
if nearbyRequests.length > 0:
// Pre-assign (driver hasn't finished current trip yet)
bestMatch = rankAndSelect(trip.driver, nearbyRequests)
bestMatch.status = "PRE_MATCHED"
bestMatch.assigned_driver = trip.driver_id
bestMatch.eta = trip.timeToCompletion + etaFromDropoffToPickup
Benefits:
✓ Reduces rider wait time by 2-3 minutes
✓ Reduces driver idle time (higher utilization)
✓ Particularly effective in high-demand areas
Supply Positioning
DISCO doesn't just match — it also repositions idle drivers to where demand is predicted to appear:
function repositionDrivers(city):
// Predict demand for next 15 minutes per H3 cell
demandForecast = mlModel.predictDemand(city, horizon=15_min)
// Current supply per cell
currentSupply = locationService.getSupplyByCell(city)
// Find undersupplied cells
for each cell in city.cells:
deficit = demandForecast[cell] - currentSupply[cell]
if deficit > THRESHOLD:
// Find nearest idle drivers in oversupplied neighbor cells
idleDrivers = findIdleDriversNear(cell, radius=3_km)
// Send gentle nudge: "Head to downtown for more requests"
for driver in idleDrivers[:deficit]:
notify(driver, {
type: "REPOSITION_SUGGESTION",
destination: cell.center,
reason: "High demand expected in this area",
incentive: "$3 bonus for next trip from this zone"
})
Pool Matching (UberPool / UberX Share)
Pool rides add another dimension of complexity — matching multiple riders heading in the same direction into a single vehicle:
function poolMatch(newRequest):
// Find active pool trips with available seats
activePoolTrips = tripService.getActivePoolTrips(
near = newRequest.pickup,
radius = 1.5 // km
)
for each poolTrip in activePoolTrips:
// Check if adding this rider makes the route efficient
currentRoute = poolTrip.optimizedRoute
newRoute = routeOptimizer.addStop(
currentRoute,
pickup = newRequest.pickup,
dropoff = newRequest.dropoff
)
detour = newRoute.totalTime - currentRoute.totalTime
maxDetour = 0.25 * currentRoute.totalTime // max 25% longer
if detour <= maxDetour && poolTrip.seats > 0:
candidates.add({
trip: poolTrip,
detour: detour,
savings: computeFareSavings(newRequest, poolTrip)
})
if candidates.length > 0:
// Pick the pool trip with minimum detour
best = candidates.sortBy(c => c.detour).first()
addRiderToPool(best.trip, newRequest)
else:
// Start a new pool trip (match with a fresh driver)
startNewPoolTrip(newRequest)
Real-Time Communication
Uber maintains persistent WebSocket connections with every active rider and driver app. This enables sub-second location updates, instant match notifications, and live trip tracking.
WebSocket Architecture
┌─────────────┐ ┌─────────────┐
│ Rider App │ │ Driver App │
│ (WSS) │ │ (WSS) │
└──────┬──────┘ └──────┬──────┘
│ │
▼ ▼
┌─────────────────────────────────┐
│ WebSocket Gateway Cluster │ ← 1000+ servers
│ (Sticky sessions via LB) │ ~10K connections/server
└──────┬──────────────────┬───────┘
│ │
▼ ▼
┌──────────────┐ ┌──────────────┐
│ Redis Pub/Sub │ │ Connection │
│ (Channel per │ │ Registry │
│ trip_id) │ │ (which server│
└──────────────┘ │ has which │
│ user?) │
└──────────────┘
Message flow for location update:
1. Driver app sends GPS update via WebSocket
2. Gateway publishes to Redis channel "trip:{trip_id}"
3. Gateway serving the rider subscribes to that channel
4. Rider receives driver location in < 500ms
Connection Management
// When driver goes online:
ws.onConnect(driver_id):
// Register in connection registry
redis.HSET("ws:connections", driver_id, this_server_id)
// Subscribe driver to their personal channel
subscribe("driver:{driver_id}")
// When ride is matched:
function onRideMatched(trip):
// Create a trip channel
tripChannel = "trip:{trip.id}"
// Subscribe both rider and driver to it
routeToUser(trip.rider_id, { subscribe: tripChannel })
routeToUser(trip.driver_id, { subscribe: tripChannel })
// When driver location updates during trip:
function onDriverLocation(driver_id, lat, lng):
trip = getActiveTrip(driver_id)
redis.PUBLISH("trip:{trip.id}", JSON.stringify({
type: "DRIVER_LOCATION",
lat: lat, lng: lng,
heading: heading,
eta_seconds: computeETA(lat, lng, trip)
}))
Data Pipeline & Analytics
Uber generates enormous amounts of data that feed back into improving every aspect of the system.
Event Streaming Architecture
All services emit events to Kafka:
┌──────────┐ ┌──────────┐ ┌──────────┐
│ Location │ │ Trip │ │ Payment │
│ Service │ │ Service │ │ Service │
└────┬─────┘ └────┬─────┘ └────┬─────┘
│ │ │
▼ ▼ ▼
┌──────────────────────────────────────┐
│ Apache Kafka │
│ Topics: │
│ driver-locations (1.3M msg/sec) │
│ trip-events (500 msg/sec) │
│ payment-events (300 msg/sec) │
│ surge-updates (50 msg/sec) │
└──────┬───────────┬───────────┬───────┘
│ │ │
▼ ▼ ▼
┌──────────┐ ┌──────────┐ ┌──────────┐
│ Real-Time│ │ Data │ │ ML │
│ Dashbrd │ │ Lake │ │ Training │
│ (Flink) │ │ (S3/HDFS)│ │ Pipeline │
└──────────┘ └──────────┘ └──────────┘
Key Metrics Tracked
| Metric | Use Case | Update Frequency |
|---|---|---|
| ETA accuracy | Retrain ETA ML model | Real-time |
| Match accept rate | Tune matching radius & ranking | Per minute |
| Surge effectiveness | Calibrate surge curves | Per 5 minutes |
| Driver utilization | Reposition idle drivers | Per minute |
| Cancellation rate | Identify UX friction points | Hourly |
| Payment failure rate | Trigger retry or fallback | Real-time |
| P99 matching latency | SLA monitoring | Real-time |
Fault Tolerance & Reliability
What Happens When Things Fail?
| Failure | Impact | Mitigation |
|---|---|---|
| Redis shard down | Location queries fail for that city | Replica promotion in <5s; fallback to stale cache |
| Matching service crash | New rides can't be matched | Multiple replicas; Kafka buffers unmatched requests |
| Kafka broker down | Event delivery delayed | 3x replication; producer retries with idempotency |
| Payment provider outage | Can't authorize holds | Fallback to secondary provider; allow trip with post-charge |
| ETA service slow | Matching uses stale ETAs | Circuit breaker; fall back to distance-based matching |
| WebSocket gateway crash | Users lose real-time updates | Client auto-reconnects; server-sent events as fallback |
| Database primary down | Trip writes fail | Automatic failover to replica; write-ahead log preserves data |
Graceful Degradation Hierarchy
Level 0: Everything healthy
→ Full functionality
Level 1: ETA service degraded
→ Use cached ETAs (stale by up to 60s)
→ Match by distance instead of ETA
→ Show "ETA approximate" in rider app
Level 2: Surge pricing unavailable
→ Default to 1.0x (no surge)
→ Accept lower revenue rather than show errors
Level 3: Payment service down
→ Allow trips to proceed (collect payment later)
→ Flag trips as "payment_pending"
→ Process when service recovers
Level 4: Matching service overloaded
→ Shed load: only process UberX (highest volume type)
→ Queue premium rides (Black, SUV) for processing next
→ Increase matching radius to reduce computation
Level 5: Catastrophic multi-service failure
→ Static pricing, basic nearest-driver match
→ "We're experiencing issues" banner in app
→ Preserve trip safety features (911 button, trip sharing)
Summary & Key Takeaways
| Component | Technology | Key Design Decision |
|---|---|---|
| Location Store | Redis Geospatial | In-memory for sub-ms reads; sharded by city |
| Matching | DISCO (Custom) | Batched Hungarian algorithm over greedy nearest |
| Trip Lifecycle | PostgreSQL + Kafka | State machine with event sourcing for auditability |
| ETA | Graph + ML | Contraction Hierarchies + live traffic + XGBoost correction |
| Surge Pricing | H3 Grid + Redis | Hexagonal cells for uniform neighbor distances |
| Payment | Stripe / Internal | Authorize-then-capture for unknown final amounts |
| Real-Time | WebSocket + Redis Pub/Sub | Per-trip channels for efficient fan-out |
| Data Pipeline | Kafka → Flink → S3 | All events streamed for real-time and batch analytics |