High Level Design Series · Real-World Designs · Part 6· Post 54 of 70

Design: Proximity Service (Yelp)

April 2026 · 26 min read

The Problem

You open Yelp, Google Maps, or Uber Eats and search for "pizza near me." Within milliseconds, you get a ranked list of nearby pizza restaurants sorted by distance, rating, and relevance. Behind this seemingly simple query lies one of the most interesting problems in system design: spatial proximity search.

The challenge isn't just finding things that are close — it's doing so across hundreds of millions of businesses with sub-100ms latency while supporting complex filters (open now, 4+ stars, $$$ price range) and handling millions of concurrent queries.

Why is this hard? Traditional database indexes work on one-dimensional data (sort by price, filter by name). But proximity is inherently two-dimensional — you need to efficiently find all points within a radius on a 2D plane (or more precisely, on a sphere). A B-tree index on latitude won't help with longitude, and vice versa.

Requirements

Functional Requirements

Search nearby businesses — given a user's latitude/longitude and a search radius (e.g., 1 km, 5 km, 25 km), return all businesses within that radius
Filter results — by category (restaurant, gas station, hotel), rating, price range, open/closed status
Sort results — by distance, rating, popularity, or a weighted combination
Business CRUD — business owners can add, update, or remove their listings (name, address, hours, photos, etc.)
View business details — users can click on a business to see full details, reviews, photos, menu

Non-Functional Requirements

Scale — 200 million businesses worldwide, 500 million daily active users
Latency — search results within 200ms (p99), business detail page within 100ms
Availability — 99.99% uptime; users must always be able to search
Read-heavy — search:write ratio is roughly 1000:1 (millions of searches per second vs. thousands of business updates per day)
Eventual consistency OK — if a business updates its hours, it's acceptable for the change to propagate in minutes, not milliseconds

Back-of-Envelope Estimates

Metric	Estimate
Total businesses	200 million
Daily active users	500 million
Search QPS (avg)	~5,000 per second (500M × 5 searches/day ÷ 86,400)
Peak search QPS	~25,000 per second (5× average)
Business data size	~2 KB per business × 200M = ~400 GB
Geospatial index size	~20 bytes per entry × 200M = ~4 GB (fits in memory!)

Key insight: The geospatial index (just coordinates + business ID) is only ~4 GB. This easily fits in memory on a single server. The challenge isn't raw data size — it's query throughput (25K QPS) and the need for low-latency spatial queries across a two-dimensional space.

High-Level Architecture

The system naturally splits into two mostly independent services:

┌─────────────────────────────────────────────────────────────────────┐
│                         CLIENTS                                     │
│       (Mobile App, Web App, Third-Party API Consumers)              │
└──────────┬─────────────────────────────────┬────────────────────────┘
           │ GET /search?lat=X&lng=Y&r=5km   │ POST/PUT /businesses
           ▼                                 ▼
┌─────────────────────┐          ┌──────────────────────┐
│   Location-Based    │          │     Business         │
│   Search Service    │          │     Service          │
│   (read-heavy)      │          │     (write path)     │
└────────┬────────────┘          └────────┬─────────────┘
         │                                │
         │ ┌──────────────┐               │
         ├─▶  Geo Index   │               │
         │ │ (Redis /     │               │
         │ │  Geohash)    │◄──────────────┤ update index
         │ └──────────────┘               │ on write
         │                                │
         │ ┌──────────────┐    ┌──────────▼──────────┐
         └─▶  Cache       │    │  Business Database   │
           │ (Redis)      │◄───│  (PostgreSQL)        │
           └──────────────┘    └─────────────────────┘

Location-Based Search (LBS) Service — handles all proximity queries. Stateless, horizontally scalable. Reads from the geospatial index and cache. No writes.
Business Service — handles all CRUD operations for business data. Writes to the primary database and updates the geospatial index asynchronously.
Geospatial Index — the core data structure that enables fast spatial queries. Can be Redis with GEOADD/GEOSEARCH, PostGIS, or an in-memory quadtree.
Business Database — stores all business metadata (name, address, hours, photos, reviews). PostgreSQL with read replicas.
Cache Layer — caches business details and popular search results. Business data changes infrequently, so high cache hit rates are achievable.

API Design

// Search nearby businesses
GET /api/v1/search/nearby
  ?latitude=37.7749
  &longitude=-122.4194
  &radius=5000          // meters
  &category=restaurant  // optional filter
  &sort_by=distance     // distance | rating | popularity
  &limit=20
  &offset=0

Response:
{
  "total": 142,
  "businesses": [
    {
      "id": "biz_8x7k2m",
      "name": "Joe's Pizza",
      "latitude": 37.7751,
      "longitude": -122.4183,
      "distance_meters": 105,
      "rating": 4.5,
      "category": "restaurant",
      "price_range": "$$",
      "is_open": true,
      "thumbnail_url": "https://cdn.example.com/..."
    },
    ...
  ]
}

// Get business details
GET /api/v1/businesses/{business_id}

// Create / update business
POST /api/v1/businesses
PUT  /api/v1/businesses/{business_id}

Geospatial Indexing Approaches

The central challenge: given a point (lat, lng) and a radius, how do you efficiently find all businesses within that circle? Let's examine the four major approaches.

The Naive Approach (Why It Fails)

The simplest idea is to scan every business and compute its distance from the query point:

SELECT id, name, latitude, longitude,
       haversine(query_lat, query_lng, latitude, longitude) AS dist
FROM businesses
WHERE haversine(query_lat, query_lng, latitude, longitude) < 5000
ORDER BY dist
LIMIT 20;

With 200 million rows, this full-table scan takes minutes, not milliseconds. We can add a bounding-box filter:

-- Bounding box pre-filter (uses B-tree indexes on lat/lng)
WHERE latitude  BETWEEN 37.73 AND 37.82
  AND longitude BETWEEN -122.46 AND -122.38

This helps but still requires scanning all businesses within the bounding box and computing exact distances. For dense urban areas, that box might contain thousands of businesses. We need a spatial index.

Approach 1: Geohash

A geohash is a clever encoding that converts a 2D coordinate (latitude, longitude) into a 1D string. This lets you use ordinary B-tree indexes for spatial queries.

The core idea: recursively divide the Earth's surface into a grid. At each step, divide the current cell in half — first by longitude (left/right → 0/1), then by latitude (bottom/top → 0/1). Each bit narrows the cell. After enough bits, you have a small cell that precisely locates a point.

The resulting binary string is encoded in base32 (characters: 0123456789bcdefghjkmnpqrstuvwxyz — note the missing a, i, l, o to avoid confusion).

Geohash Length	Cell Width	Cell Height	Use Case
1	5,000 km	5,000 km	Continent-scale
2	1,250 km	625 km	Country
3	156 km	156 km	Large region
4	39.1 km	19.5 km	City / metro area
5	4.9 km	4.9 km	Neighborhood
6	1.2 km	0.61 km	Walking distance (most common)
7	153 m	153 m	Street block
8	38.2 m	19.1 m	Building-level

Approach 2: Quadtree

A quadtree is an in-memory tree data structure that recursively subdivides a 2D space into four quadrants. Unlike geohash (which uses a fixed grid), a quadtree is adaptive — dense areas get more subdivisions, sparse areas stay as large cells.

// Quadtree node
class QuadTreeNode {
    Boundary bounds;           // {x, y, width, height}
    int MAX_CAPACITY = 100;    // max businesses per leaf
    List<Business> businesses; // only in leaf nodes
    QuadTreeNode NW, NE, SW, SE; // children (null for leaves)

    void insert(Business b) {
        if (!bounds.contains(b.location)) return;

        if (isLeaf() && businesses.size() < MAX_CAPACITY) {
            businesses.add(b);
        } else {
            if (isLeaf()) subdivide(); // split into 4 children
            NW.insert(b); NE.insert(b);
            SW.insert(b); SE.insert(b);
        }
    }

    List<Business> searchRadius(Point center, double radius) {
        List<Business> results = new ArrayList<>();
        if (!bounds.intersectsCircle(center, radius)) return results;

        if (isLeaf()) {
            for (Business b : businesses)
                if (distance(center, b.location) <= radius)
                    results.add(b);
        } else {
            results.addAll(NW.searchRadius(center, radius));
            results.addAll(NE.searchRadius(center, radius));
            results.addAll(SW.searchRadius(center, radius));
            results.addAll(SE.searchRadius(center, radius));
        }
        return results;
    }
}

Advantages: Adaptive granularity (Manhattan gets tiny cells, Sahara gets huge ones), efficient range queries, easy to visualize. Disadvantages: Must live in memory (can't use database indexes), harder to distribute across multiple servers, updates require tree rebalancing.

Approach 3: R-tree

R-trees are the spatial equivalent of B-trees. They group nearby objects into minimum bounding rectangles (MBRs) at each level of the tree. Used internally by PostGIS and most spatial databases.

Each internal node contains MBRs of its children
A search for "all points within radius R of point P" starts at the root, prunes branches whose MBR doesn't intersect the query circle, and recurses into matching branches
Balanced tree — all leaves at the same depth → O(log N) search
Supports efficient inserts and deletes (unlike quadtree, which may need restructuring)

Approach 4: Google S2 Geometry

S2 is Google's approach. Instead of projecting the sphere onto a flat grid (which distorts areas near the poles), S2:

Projects the sphere onto a cube — 6 faces, minimal distortion
Maps each cube face to a unit square using a non-linear transform
Fills each face with a Hilbert curve — a space-filling curve that maps 2D points to 1D values while preserving locality (nearby 2D points → nearby 1D values)
Each point on Earth gets a 64-bit S2 cell ID

S2 cells at level 30 are ~1 cm². At level 12, they're ~3.3 km² (comparable to geohash length 5). The Hilbert curve ordering means a range scan on S2 cell IDs efficiently retrieves spatially nearby points.

Comparison summary: Geohash is simplest (works with any database), Quadtree is best for in-memory adaptive indexing, R-tree is what PostGIS uses under the hood, and S2 is used by Google services for its superior handling of polar regions and uniform cell sizes. For most proximity services, geohash + Redis or PostGIS (R-tree) is the pragmatic choice.

Geohash Deep Dive

The Encoding Algorithm

Let's encode the coordinates of the Statue of Liberty: (40.6892, -74.0445)

// Step 1: Interleave longitude and latitude bits
// Longitude range: [-180, 180], Latitude range: [-90, 90]
// Odd bit positions → longitude, Even bit positions → latitude

function encodeGeohash(lat, lng, precision) {
    const BASE32 = '0123456789bcdefghjkmnpqrstuvwxyz';
    let latRange = [-90, 90];
    let lngRange = [-180, 180];
    let hash = '';
    let bits = 0;
    let currentBit = 0;
    let isLng = true;  // start with longitude

    while (hash.length < precision) {
        if (isLng) {
            let mid = (lngRange[0] + lngRange[1]) / 2;
            if (lng >= mid) {
                currentBit = currentBit * 2 + 1; // bit = 1
                lngRange[0] = mid;
            } else {
                currentBit = currentBit * 2;     // bit = 0
                lngRange[1] = mid;
            }
        } else {
            let mid = (latRange[0] + latRange[1]) / 2;
            if (lat >= mid) {
                currentBit = currentBit * 2 + 1;
                latRange[0] = mid;
            } else {
                currentBit = currentBit * 2;
                latRange[1] = mid;
            }
        }

        isLng = !isLng;
        bits++;

        if (bits === 5) {
            hash += BASE32[currentBit];
            bits = 0;
            currentBit = 0;
        }
    }
    return hash;
}

// encodeGeohash(40.6892, -74.0445, 9) → "dr5r7p4rx"

Let's trace through the first few bits for longitude = -74.0445:

Step	Range	Mid	-74.0445 vs Mid	Bit
1	[-180, 180]	0	-74 < 0 → left half	0
2	[-180, 0]	-90	-74 ≥ -90 → right half	1
3	[-90, 0]	-45	-74 < -45 → left half	0
4	[-90, -45]	-67.5	-74 < -67.5 → left half	0

The interleaved bit stream (lng, lat, lng, lat, ...) is grouped into 5-bit chunks and mapped to base32 characters. Each character adds 5 bits of precision.

The Boundary Problem

Geohash has a critical edge case: two locations that are physically close can have completely different geohash prefixes if they fall on opposite sides of a cell boundary.

// These two cafés are 50 meters apart, but their geohashes
// share NO common prefix at length 6:
Café A: (40.7128, -74.0060) → geohash "dr5reg..."
Café B: (40.7130, -74.0062) → geohash "dr5rew..."

// Even worse — at the boundary of geohash cells:
Point 1: geohash "9q8yyk"  (San Francisco)
Point 2: geohash "9q8yys"  (100m away, but different cell)

If you only search within the user's own geohash cell, you'll miss nearby businesses in adjacent cells.

The Solution: Search 9 Cells

The standard solution is to search the user's cell plus all 8 neighboring cells:

┌──────────┬──────────┬──────────┐
│ NW       │ N        │ NE       │
│ 9q8yyh   │ 9q8yyk   │ 9q8yym   │
├──────────┼──────────┼──────────┤
│ W        │ ★ USER   │ E        │
│ 9q8yy5   │ 9q8yy7   │ 9q8yys   │
├──────────┼──────────┼──────────┤
│ SW       │ S        │ SE       │
│ 9q8yy4   │ 9q8yy6   │ 9q8yy1   │
└──────────┴──────────┴──────────┘

Query: SELECT * FROM businesses
       WHERE geohash IN ('9q8yy7','9q8yyh','9q8yyk','9q8yym',
                          '9q8yy5','9q8yys','9q8yy4','9q8yy6','9q8yy1')

Computing neighbors is O(1) — you simply increment/decrement the geohash string using a lookup table. This guarantees that any point within the cell's width/height of the user will be included in the search.

▶ Geohash Grid: Proximity Search

Watch how a proximity search works: compute the user's geohash, find the cell + 8 neighbors, filter by radius.

The Two-Step Search Algorithm

Proximity search always follows two steps:

Step 1: Find Candidate Businesses (Coarse Filter)

Use the geospatial index to quickly retrieve a superset of possible results. This is fast but imprecise — it returns all businesses in a square region (the 9 geohash cells), not a circular radius.

function findCandidates(userLat, userLng, radiusMeters) {
    // Choose geohash precision based on radius
    let precision = radiusToGeohashPrecision(radiusMeters);
    // Precision 6 (~1.2 km cells) for radius 1-5 km
    // Precision 5 (~5 km cells) for radius 5-25 km
    // Precision 4 (~40 km cells) for radius 25+ km

    let userGeohash = encodeGeohash(userLat, userLng, precision);
    let neighbors = getNeighbors(userGeohash); // 8 adjacent cells
    let cells = [userGeohash, ...neighbors];   // 9 cells total

    // Query all businesses in these 9 cells
    return db.query(
        `SELECT * FROM businesses WHERE geohash_${precision}
         IN (${cells.map(c => `'${c}'`).join(',')})`
    );
}

Step 2: Filter by Exact Distance (Fine Filter)

For each candidate, compute the exact great-circle distance using the Haversine formula and filter out any business beyond the radius.

The Haversine Formula

The Haversine formula computes the great-circle distance between two points on a sphere. It accounts for Earth's curvature, unlike simple Euclidean distance.

// The Haversine Formula
// Given two points (φ₁, λ₁) and (φ₂, λ₂) in radians:
//
//   a = sin²(Δφ/2) + cos(φ₁) · cos(φ₂) · sin²(Δλ/2)
//   c = 2 · atan2(√a, √(1−a))
//   d = R · c
//
// where R = 6,371 km (Earth's mean radius)

function haversine(lat1, lng1, lat2, lng2) {
    const R = 6371000; // Earth's radius in meters
    const toRad = deg => deg * Math.PI / 180;

    const φ1 = toRad(lat1), φ2 = toRad(lat2);
    const Δφ = toRad(lat2 - lat1);
    const Δλ = toRad(lng2 - lng1);

    const a = Math.sin(Δφ/2) * Math.sin(Δφ/2) +
              Math.cos(φ1) * Math.cos(φ2) *
              Math.sin(Δλ/2) * Math.sin(Δλ/2);

    const c = 2 * Math.atan2(Math.sqrt(a), Math.sqrt(1-a));

    return R * c; // distance in meters
}

// haversine(40.6892, -74.0445, 40.7484, -73.9857) = 6,970 m
// (Statue of Liberty to Empire State Building)

Why not Euclidean distance? At the equator, 1° of longitude ≈ 111 km. But at latitude 60°, 1° of longitude ≈ 55.5 km (cells shrink as you approach the poles). Euclidean distance using raw lat/lng would overestimate east-west distances at high latitudes. The Haversine formula handles this correctly.

Complete Search Flow

function searchNearby(userLat, userLng, radiusMeters, filters) {
    // Step 1: Coarse filter — get candidates from geohash cells
    let candidates = findCandidates(userLat, userLng, radiusMeters);

    // Step 2: Fine filter — exact distance check
    let results = candidates
        .map(biz => ({
            ...biz,
            distance: haversine(userLat, userLng, biz.lat, biz.lng)
        }))
        .filter(biz => biz.distance <= radiusMeters);

    // Step 3: Apply business filters
    if (filters.category)
        results = results.filter(b => b.category === filters.category);
    if (filters.minRating)
        results = results.filter(b => b.rating >= filters.minRating);
    if (filters.openNow)
        results = results.filter(b => isOpen(b, Date.now()));

    // Step 4: Sort
    if (filters.sortBy === 'distance')
        results.sort((a, b) => a.distance - b.distance);
    else if (filters.sortBy === 'rating')
        results.sort((a, b) => b.rating - a.rating);

    return results.slice(0, filters.limit || 20);
}

Database Design

Business Metadata — PostgreSQL

CREATE TABLE businesses (
    id              UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    name            VARCHAR(255) NOT NULL,
    description     TEXT,
    latitude        DECIMAL(10, 7) NOT NULL,
    longitude       DECIMAL(10, 7) NOT NULL,
    address         TEXT NOT NULL,
    city            VARCHAR(100),
    state           VARCHAR(50),
    country         VARCHAR(50),
    zip_code        VARCHAR(20),
    category        VARCHAR(50) NOT NULL,
    price_range     SMALLINT CHECK (price_range BETWEEN 1 AND 4),
    phone           VARCHAR(20),
    website         VARCHAR(255),
    rating          DECIMAL(2,1) DEFAULT 0.0,
    review_count    INTEGER DEFAULT 0,
    photo_urls      TEXT[],            -- PostgreSQL array
    is_active       BOOLEAN DEFAULT true,
    created_at      TIMESTAMPTZ DEFAULT NOW(),
    updated_at      TIMESTAMPTZ DEFAULT NOW(),

    -- Precomputed geohash columns for different precisions
    geohash_4       CHAR(4) NOT NULL,  -- ~40 km cells
    geohash_5       CHAR(5) NOT NULL,  -- ~5 km cells
    geohash_6       CHAR(6) NOT NULL   -- ~1.2 km cells
);

CREATE TABLE business_hours (
    business_id     UUID REFERENCES businesses(id),
    day_of_week     SMALLINT NOT NULL, -- 0=Mon, 6=Sun
    open_time       TIME,
    close_time      TIME,
    is_closed       BOOLEAN DEFAULT false,
    PRIMARY KEY (business_id, day_of_week)
);

-- Indexes for search
CREATE INDEX idx_geohash_4 ON businesses(geohash_4);
CREATE INDEX idx_geohash_5 ON businesses(geohash_5);
CREATE INDEX idx_geohash_6 ON businesses(geohash_6);
CREATE INDEX idx_category   ON businesses(category);
CREATE INDEX idx_lat_lng    ON businesses(latitude, longitude);

PostGIS — Native Spatial Queries

If you're already using PostgreSQL, PostGIS is the most powerful option. It adds native geometry types and spatial indexes (R-tree via GiST).

-- Enable PostGIS extension
CREATE EXTENSION postgis;

-- Add a geography column (uses sphere math automatically)
ALTER TABLE businesses ADD COLUMN location GEOGRAPHY(POINT, 4326);

-- Populate from existing lat/lng
UPDATE businesses SET location = ST_SetSRID(
    ST_MakePoint(longitude, latitude), 4326
)::geography;

-- Create spatial index (GiST = R-tree under the hood)
CREATE INDEX idx_location_gist ON businesses USING GIST(location);

-- Search for businesses within 5 km of a point
SELECT id, name, rating,
       ST_Distance(location, ST_SetSRID(
           ST_MakePoint(-122.4194, 37.7749), 4326
       )::geography) AS distance_meters
FROM businesses
WHERE ST_DWithin(
    location,
    ST_SetSRID(ST_MakePoint(-122.4194, 37.7749), 4326)::geography,
    5000  -- 5000 meters = 5 km
)
AND category = 'restaurant'
AND rating >= 4.0
ORDER BY distance_meters
LIMIT 20;

-- This uses the spatial index for the DWithin filter,
-- then computes exact distance for sorting.
-- Typical query time: 5-20ms for millions of rows!

ST_DWithin vs ST_Distance: ST_DWithin(geom, geom, distance) is an index-accelerated predicate — it uses the R-tree to quickly eliminate far-away points. ST_Distance computes exact distance but cannot use the index. Always use ST_DWithin in the WHERE clause and ST_Distance only for ORDER BY.

Redis — In-Memory Geospatial Index

Redis has built-in geospatial commands that use geohash internally. For a read-heavy proximity service, this is often the best choice due to its sub-millisecond latency.

-- Add businesses to the geospatial index
-- GEOADD key longitude latitude member
GEOADD businesses:geo -122.4194 37.7749 "biz_001"
GEOADD businesses:geo -122.4089 37.7837 "biz_002"
GEOADD businesses:geo -122.4000 37.7900 "biz_003"

-- Bulk add (more efficient)
GEOADD businesses:geo
    -122.4194 37.7749 "biz_001"
    -122.4089 37.7837 "biz_002"
    -122.4000 37.7900 "biz_003"

-- Search within 5 km radius (Redis 6.2+)
GEOSEARCH businesses:geo
    FROMLONLAT -122.4194 37.7749
    BYRADIUS 5 km
    ASC                   -- sort by distance
    COUNT 20              -- limit results
    WITHCOORD             -- include coordinates
    WITHDIST              -- include distance

-- Result:
-- 1) "biz_001", "0.0000", "-122.4194", "37.7749"
-- 2) "biz_002", "1.2345", "-122.4089", "37.7837"
-- 3) "biz_003", "2.3456", "-122.4000", "37.7900"

-- Get the geohash of a member
GEOHASH businesses:geo "biz_001"
-- → "9q8yyk0h0"

-- Get the position of a member
GEOPOS businesses:geo "biz_001"
-- → "-122.4194" "37.7749"

-- Distance between two businesses
GEODIST businesses:geo "biz_001" "biz_002" km
-- → "1.2345"

How Redis implements this under the hood: Each GEOADD computes a 52-bit geohash of the coordinates and stores it as the score in a sorted set (ZSET). GEOSEARCH computes the geohash of the query point, determines which geohash ranges cover the search area (the 9-cell approach), performs range scans on the sorted set, and applies exact distance filtering. All in ~0.1ms.

# Application code: combine Redis geo with business data
import redis

r = redis.Redis()

def search_nearby(lat, lng, radius_km, category=None, limit=20):
    # Step 1: Get candidate business IDs from Redis
    results = r.geosearch(
        'businesses:geo',
        longitude=lng, latitude=lat,
        radius=radius_km, unit='km',
        sort='ASC', count=limit * 3,  # over-fetch for filtering
        withcoord=True, withdist=True
    )

    # Step 2: Fetch full business details from cache/DB
    biz_ids = [r['member'] for r in results]
    businesses = get_businesses_batch(biz_ids)  # Redis hash or DB

    # Step 3: Apply filters
    if category:
        businesses = [b for b in businesses if b['category'] == category]

    # Step 4: Merge distance info and return
    distance_map = {r['member']: r['dist'] for r in results}
    for b in businesses:
        b['distance_km'] = distance_map.get(b['id'], 0)

    return businesses[:limit]

Quadtree Deep Dive

Let's trace through how a quadtree builds itself and performs a proximity search. Unlike geohash (which uses fixed-size cells), the quadtree adapts to data density.

Building the Quadtree

// Build a quadtree from 200M businesses:
// 1. Start with a single node covering entire world
//    bounds = {lat: [-90, 90], lng: [-180, 180]}
//    businesses = all 200M
//
// 2. 200M > MAX_CAPACITY (100) → SPLIT into 4 quadrants:
//    NW: lat [0, 90],  lng [-180, 0]    — 45M businesses
//    NE: lat [0, 90],  lng [0, 180]     — 55M businesses
//    SW: lat [-90, 0], lng [-180, 0]    — 40M businesses
//    SE: lat [-90, 0], lng [0, 180]     — 60M businesses
//
// 3. Each quadrant still has > 100 → SPLIT recursively
//    ...until each leaf node has ≤ 100 businesses
//
// For 200M businesses with MAX_CAPACITY=100:
//   Tree depth ≈ log₄(200M / 100) ≈ 10-12 levels
//   Total leaf nodes ≈ 200M / 100 = 2M nodes
//   Memory ≈ 2M nodes × ~100 bytes = ~200 MB

Searching the Quadtree

// Search: "Find all businesses within 2 km of (37.77, -122.42)"
// The search circle intersects only a few quadtree cells:
//
// Level 0: Root covers world → circle intersects? YES → recurse
// Level 1: NW quadrant covers N. America → intersects? YES → recurse
// Level 2: Sub-quadrant covers California → intersects? YES → recurse
// ...
// Level 10: Leaf covering 2 blocks of SF → intersects? YES
//           → check all 87 businesses in this leaf
//           → compute exact distance for each
//           → 12 are within 2 km radius
//
// Total nodes visited: ~40 (out of 2M)
// Total distance computations: ~200 (out of 200M)
// Time complexity: O(log N + k) where k = result set size

▶ Quadtree: Spatial Subdivision & Search

Watch a quadtree recursively split dense regions, then search for businesses near a query point.

Caching Strategy

Business data is read-heavy and rarely changes. A restaurant's name, address, and hours might update a few times per year. This makes aggressive caching highly effective.

Cache Layers

┌───────────────────────────────────────────────────────────────┐
│                       CACHE ARCHITECTURE                       │
├───────────────────────────────────────────────────────────────┤
│                                                               │
│  Layer 1: CDN / Edge Cache                                    │
│  ├── Static assets (photos, logos)                            │
│  └── Popular search results (geohash-based cache keys)        │
│      Key: "search:9q8yy:restaurant:5km"                       │
│      TTL: 5 minutes (stale-while-revalidate)                  │
│                                                               │
│  Layer 2: Application Cache (Redis)                           │
│  ├── Business details: "biz:{id}" → full JSON                 │
│  │   TTL: 1 hour                                              │
│  ├── Geohash cell contents: "cell:{geohash}" → [biz_ids]     │
│  │   TTL: 30 minutes                                          │
│  └── Popular search results                                   │
│      TTL: 5 minutes                                           │
│                                                               │
│  Layer 3: Database Read Replicas                              │
│  ├── 5-10 PostgreSQL replicas for read queries                │
│  └── PostGIS indexes on each replica                          │
│                                                               │
│  Layer 4: Primary Database                                    │
│  └── Single-leader PostgreSQL (writes only)                   │
│                                                               │
└───────────────────────────────────────────────────────────────┘

Cache Key Design

# Business detail cache
key:  "biz:biz_8x7k2m"
value: {"id":"biz_8x7k2m","name":"Joe's Pizza","rating":4.5,...}
ttl:  3600  # 1 hour

# Geohash cell cache — all business IDs in a geohash cell
key:  "geo:cell:9q8yyk"
value: ["biz_001","biz_002","biz_003",...]
ttl:  1800  # 30 min

# Search result cache — full response for common queries
key:  "search:9q8yyk:restaurant:5km:rating"
value: [{"id":"biz_001","name":"Joe's Pizza","distance":105,...},...]
ttl:  300   # 5 min (search results change more often due to open/close)

Cache Invalidation

// When a business is updated:
function onBusinessUpdate(business) {
    // 1. Invalidate business detail cache
    cache.delete(`biz:${business.id}`);

    // 2. If location changed, update geospatial index
    if (business.locationChanged) {
        redis.zrem('businesses:geo', business.id);
        redis.geoadd('businesses:geo',
            business.longitude, business.latitude, business.id);

        // Invalidate old and new geohash cell caches
        cache.delete(`geo:cell:${business.oldGeohash6}`);
        cache.delete(`geo:cell:${business.newGeohash6}`);
    }

    // 3. Invalidate search caches for affected cells
    // (Use a pattern-based deletion or let TTL expire)
    let affectedCells = [business.geohash6, ...getNeighbors(business.geohash6)];
    affectedCells.forEach(cell => {
        cache.deletePattern(`search:${cell}:*`);
    });
}

Write-behind pattern: Since business updates are rare (search:write = 1000:1), we can afford eventual consistency. When a business owner updates their hours, the change goes to the primary DB immediately, but cache invalidation propagates over the next few minutes. Users might see stale hours for a short window — perfectly acceptable.

Scaling the System

Read Path Scaling

The search service is stateless and read-only, making it trivially horizontally scalable:

┌──────────────────┐
│  Load Balancer   │
│  (Layer 7)       │
└────┬───┬───┬─────┘
     │   │   │
┌────▼┐┌─▼──┐┌▼────┐
│LBS-1││LBS-2││LBS-N│  ← N stateless search servers
└──┬──┘└──┬──┘└──┬──┘
   │      │      │
   ▼      ▼      ▼
┌──────────────────┐     ┌──────────────────┐
│  Redis Cluster   │     │  DB Read Replicas│
│  (Geo Index +    │     │  (PostgreSQL)    │
│   Cache)         │     │  5-10 replicas   │
└──────────────────┘     └──────────────────┘

LBS servers — add more instances behind the load balancer; zero state to manage
Redis Cluster — shard the geo index by region (US-West, US-East, Europe, Asia). Each shard handles queries for its geographic area
DB replicas — add more read replicas to handle business detail queries that miss the cache

Geographic Sharding

// Shard the geo index by geographic region
// Each shard covers a set of geohash prefixes

Shard 1 (US-West):    geohash prefixes 9q*, 9r*, 9w*, 9x*
Shard 2 (US-East):    geohash prefixes dr*, dq*, dp*
Shard 3 (Europe):     geohash prefixes u0*-u9*, gc*, gf*
Shard 4 (Asia):       geohash prefixes w*, x*, y*

// Routing: compute 2-char geohash prefix from query point,
// look up which shard handles that prefix
function routeToShard(lat, lng) {
    let prefix = encodeGeohash(lat, lng, 2);
    return shardMap[prefix]; // O(1) lookup
}

Write Path Scaling

Business updates are infrequent (~1,000 writes/sec vs. 25,000 reads/sec), so the write path is simpler:

Business Update Flow:
1. Business Service validates and writes to primary PostgreSQL
2. PostgreSQL replicates to read replicas (async, < 1 sec)
3. Business Service publishes event to message queue (Kafka)
4. Geo Index Updater consumes event, updates Redis GEOADD
5. Cache Invalidator consumes event, deletes stale cache entries

┌─────────┐     ┌──────────┐     ┌─────────┐
│ Business │────▶│ Primary  │────▶│ Read    │
│ Service  │     │ DB       │     │ Replicas│
└────┬─────┘     └──────────┘     └─────────┘
     │
     │ publish event
     ▼
┌─────────────┐
│   Kafka     │
└──┬──────┬───┘
   │      │
   ▼      ▼
┌──────┐ ┌──────────┐
│ Geo  │ │ Cache    │
│ Index│ │Invalidate│
│Update│ │ r        │
└──────┘ └──────────┘

Multi-Region Deployment

// Deploy in multiple regions for low latency worldwide
// Each region has a full copy of the geo index for its area

Region: US-West (Oregon)
├── LBS servers × 20
├── Redis Cluster (Americas data)
├── PostgreSQL replicas × 5
└── CDN edge cache

Region: Europe (Frankfurt)
├── LBS servers × 15
├── Redis Cluster (Europe + Africa data)
├── PostgreSQL replicas × 5
└── CDN edge cache

Region: Asia (Singapore)
├── LBS servers × 15
├── Redis Cluster (Asia-Pacific data)
├── PostgreSQL replicas × 5
└── CDN edge cache

// Cross-region sync for the business database:
// Single primary DB in US-West (writes)
// Async replication to Europe and Asia (read replicas)
// Replication lag: 50-200ms (acceptable for business updates)

Advanced Topics

Dynamic Radius Adjustment

What if the user searches within 1 km but only 2 results are found? Automatically expand:

function searchWithMinResults(lat, lng, initialRadius, minResults) {
    let radius = initialRadius;
    let maxRadius = initialRadius * 10;
    let results;

    while (radius <= maxRadius) {
        results = searchNearby(lat, lng, radius);
        if (results.length >= minResults) break;
        radius *= 2;  // double the radius each time
    }

    return results;
}

Ranking Algorithm

Real proximity services don't just sort by distance. They use a weighted score:

function rankingScore(business, userLat, userLng) {
    const dist = haversine(userLat, userLng, business.lat, business.lng);
    const maxDist = 10000; // 10 km normalization factor

    // Distance score: 1.0 (very close) → 0.0 (at max distance)
    const distScore = Math.max(0, 1 - dist / maxDist);

    // Rating score: normalized 0-1
    const ratingScore = business.rating / 5.0;

    // Popularity score: log scale of review count
    const popScore = Math.min(1, Math.log10(business.reviewCount + 1) / 4);

    // Recency boost: recently updated businesses get a bump
    const daysSinceUpdate = (Date.now() - business.updatedAt) / 86400000;
    const recencyBoost = daysSinceUpdate < 30 ? 0.1 : 0;

    // Weighted combination
    return 0.4 * distScore +
           0.3 * ratingScore +
           0.2 * popScore +
           0.1 * recencyBoost;
}

Geofencing with Geohash

Geohash is also useful for geofencing — triggering actions when users enter/leave geographic areas:

// Check if user entered a promotion zone
function checkGeofence(userLat, userLng, fenceGeohash, fencePrecision) {
    let userGeohash = encodeGeohash(userLat, userLng, fencePrecision);
    if (userGeohash === fenceGeohash) {
        triggerNotification("You're near Joe's Pizza! 20% off today.");
    }
}

// Real-time geofencing with Redis pub/sub:
// When user location updates, compute geohash
// If geohash changed → publish to "location_updates" channel
// Geofence service subscribes and checks against active fences

Putting It All Together

Component	Technology	Why
Geospatial index	Redis GEOADD + GEOSEARCH	Sub-ms latency, built-in geohash, sorted set efficiency
Business metadata	PostgreSQL + PostGIS	ACID for writes, ST_DWithin for fallback search
Cache	Redis (separate from geo)	Business details, search results
Event streaming	Kafka	Async propagation of business updates
Search service	Stateless Go/Java microservice	Horizontally scalable, no session state
CDN	CloudFront / Fastly	Edge cache for photos and popular results

Key Trade-offs

Geohash vs. Quadtree — Geohash works with any database (just a string index). Quadtree is more space-efficient for non-uniform distributions but requires in-memory management. Most production systems use geohash (or the database's built-in spatial index).
Redis vs. PostGIS — Redis is faster (0.1ms vs. 5ms) but requires maintaining a separate data store. PostGIS is simpler (single database) but slower under extreme load. Use Redis for the hot path, PostGIS as a fallback.
Fixed grid vs. adaptive — Geohash cells are fixed size, so Manhattan and the Sahara get the same cell resolution. Quadtree/S2 adapt to density. In practice, choosing the right geohash precision per query radius works well enough.
Consistency vs. availability — We choose availability (AP in CAP). A 5-minute stale business listing is acceptable. Users searching should always get results, even if slightly outdated.

Interview Checklist

Clarify requirements — radius range, number of businesses, read/write ratio
Choose spatial index — geohash (simple, DB-friendly) or quadtree (adaptive, in-memory)
Explain the two-step search — coarse filter (spatial index) → fine filter (Haversine)
Show the boundary problem — and the 9-cell solution
Database schema — precomputed geohash columns, PostGIS as alternative
Caching strategy — business data rarely changes → aggressive TTLs
Scaling — stateless search service, Redis geo sharding, multi-region deployment