High Level Design Series · Data Storage · Part 2

Document Stores (MongoDB)

April 2026 · 18 min read

Relational databases enforce rigid schemas — every row in a table must have the same columns. This works beautifully when your data is uniform and relationships are well-understood upfront. But modern applications frequently deal with polymorphic data — product catalogs where a laptop has different attributes than a t-shirt, user profiles with optional nested preferences, or event payloads that vary by type. Document stores were purpose-built for this world.

MongoDB is the most widely deployed document database, running in production at companies like Google, eBay, Toyota, Forbes, and Coinbase. It stores data as rich JSON-like documents, supports secondary indexes, horizontal sharding, ACID transactions (since 4.0), and a powerful aggregation framework. This post is a deep-dive into how it all works under the hood.

The Document Model

In a document database, the fundamental unit of storage is a document — a self-describing, hierarchical data structure similar to a JSON object. Documents are grouped into collections (the rough equivalent of SQL tables), and collections live inside databases.

A MongoDB Document — E-Commerce Order

{
  "_id": ObjectId("665a1b2c3d4e5f6a7b8c9d0e"),
  "order_number": "ORD-2026-78432",
  "customer": {
    "id": 42,
    "name": "Alice Chen",
    "email": "alice@example.com",
    "tier": "gold"
  },
  "status": "shipped",
  "items": [
    {
      "sku": "KB-MK850",
      "name": "Mechanical Keyboard",
      "quantity": 1,
      "unit_price_cents": 12990,
      "category": "electronics"
    },
    {
      "sku": "CBL-USBC-2M",
      "name": "USB-C Cable 2m",
      "quantity": 2,
      "unit_price_cents": 1500,
      "category": "accessories"
    }
  ],
  "shipping": {
    "address": {
      "street": "123 Market St",
      "city": "San Francisco",
      "state": "CA",
      "zip": "94105"
    },
    "carrier": "USPS",
    "tracking_number": "9400111899223100012345",
    "shipped_at": ISODate("2026-04-16T14:22:00Z")
  },
  "payment": {
    "method": "credit_card",
    "last_four": "4242",
    "total_cents": 15990,
    "currency": "USD"
  },
  "tags": ["repeat-customer", "expedited"],
  "created_at": ISODate("2026-04-15T10:30:00Z"),
  "updated_at": ISODate("2026-04-16T14:22:00Z")
}

This single document contains everything you need to display an order detail page — customer info, line items, shipping, and payment. In a relational database, this same data would span 5–6 normalized tables (users, orders, order_items, addresses, payments) requiring multiple JOINs to reconstruct.

Document vs Relational Model

Click "Step" to compare how each model stores and queries an order.

BSON: Binary JSON

MongoDB doesn't actually store JSON — it stores BSON (Binary JSON), a binary-encoded superset of JSON that adds types JSON lacks:

BSON Type	Size	JSON Equivalent	Why It Matters
`ObjectId`	12 bytes	String (24 hex chars)	Globally unique, contains timestamp — no UUID coordination needed
`Date`	8 bytes	String (ISO-8601)	Efficient range queries, proper sorting, timezone-aware
`Decimal128`	16 bytes	Number (float64)	Exact decimal arithmetic — critical for financial data
`Int32 / Int64`	4 / 8 bytes	Number	Distinguishes integer from floating-point
`Binary`	Variable	Base64 string	Store binary blobs (images, hashes) natively
`Regex`	Variable	String	Server-side regex matching without string parsing
`Timestamp`	8 bytes	None	Internal — used by the oplog for replication ordering

BSON's key advantage is traversability. Each field is prefixed with its type and length, allowing the storage engine to skip over fields it doesn't need without parsing the entire document. This makes indexed field extraction extremely fast.

ObjectId anatomy (12 bytes): [4-byte timestamp][5-byte random][3-byte counter]. The embedded timestamp means _id is roughly chronologically sorted — you get time-ordered inserts for free. Extract the timestamp: ObjectId("665a1b2c...").getTimestamp() → 2026-05-31T...

BSON document size limit: 16 MB. This is a hard cap. If a single document could grow unbounded (e.g., an array of millions of comments), you must use a different modeling strategy (bucketing or referencing). For truly large binary data, use GridFS, which chunks files into 255 KB pieces across two collections (fs.files + fs.chunks).

Flexible Schema

In SQL, adding a column requires ALTER TABLE — a DDL operation that may lock the table. In MongoDB, documents in the same collection can have completely different fields:

// Document 1: A laptop product
{
  "_id": ObjectId("..."),
  "type": "laptop",
  "name": "ThinkPad X1 Carbon Gen 12",
  "specs": {
    "cpu": "Intel Core Ultra 7 155H",
    "ram_gb": 32,
    "storage_gb": 1024,
    "display": "14\" 2.8K OLED",
    "battery_wh": 57
  },
  "price_cents": 164900
}

// Document 2: A t-shirt (same collection!)
{
  "_id": ObjectId("..."),
  "type": "clothing",
  "name": "MongoDB Developer Tee",
  "specs": {
    "material": "100% organic cotton",
    "sizes_available": ["S", "M", "L", "XL"],
    "color": "forest green"
  },
  "price_cents": 2999
}

Both documents live in a products collection, but their specs fields are completely different. A relational database would need either a single wide table with many NULL columns (entity-attribute-value antipattern) or multiple product-type tables.

Schema validation: "Flexible" doesn't mean "no rules." MongoDB supports JSON Schema validation at the collection level. In production, you should always define validation rules:

db.createCollection("products", {
  validator: {
    $jsonSchema: {
      bsonType: "object",
      required: ["name", "type", "price_cents"],
      properties: {
        name: { bsonType: "string" },
        type: { enum: ["laptop", "clothing", "accessory"] },
        price_cents: { bsonType: "int", minimum: 0 }
      }
    }
  }
})

Embedding vs Referencing

This is the single most important design decision in document modeling — analogous to normalization vs. denormalization in SQL. Every relationship between entities must be modeled as either an embed (nested sub-document) or a reference (foreign key stored as an ObjectId).

✦ Embedding (Denormalized)

Store related data inside the parent document.

// Order with embedded items
{
  "_id": ObjectId("..."),
  "items": [
    { "sku": "KB-MK850", "qty": 1 },
    { "sku": "CBL-USBC", "qty": 2 }
  ]
}

Pros: Single read, atomic writes, data locality
Cons: Data duplication, 16 MB limit, unbounded arrays

✦ Referencing (Normalized)

Store an ObjectId pointing to another collection.

// Order with referenced items
{
  "_id": ObjectId("..."),
  "item_ids": [
    ObjectId("aaa..."),
    ObjectId("bbb...")
  ]
}
// Separate "items" collection
{ "_id": ObjectId("aaa..."), "sku": "KB-MK850" }

Pros: No duplication, no size limit, independent updates
Cons: Requires $lookup (JOIN), multiple round trips

Decision Matrix

Factor	Embed	Reference
Read pattern	Always read together	Read independently or rarely together
Write pattern	Updated atomically	Updated independently by different services
Cardinality	1:few, 1:many (bounded)	1:many (unbounded), many:many
Data size	Child data is small	Child documents are large
Duplication tolerance	OK (denormalized)	Not acceptable
Document growth	Predictable / bounded	Unbounded array growth

Schema Design Patterns

Embed Subset Pattern

Store a subset of frequently-accessed related data in the parent, and the full data in its own collection. Example: A movie document embeds the top 20 reviews, while all 50,000 reviews live in a reviews collection.

// movies collection — embed only recent reviews
{
  "_id": ObjectId("..."),
  "title": "Interstellar",
  "recent_reviews": [
    { "user": "alice", "rating": 5, "text": "Masterpiece", "date": ISODate("2026-04-01") },
    // ... top 20 only
  ],
  "review_count": 48723
}

// reviews collection — full data
{ "_id": ObjectId("..."), "movie_id": ObjectId("..."), "user": "alice", ... }

Embed Bucket Pattern

Group time-series data into fixed-size "buckets" to reduce document count and index size. Instead of one document per sensor reading, store 200 readings per bucket.

// sensor_readings collection — 200 readings per document
{
  "_id": ObjectId("..."),
  "sensor_id": "temp-floor-3",
  "bucket_start": ISODate("2026-04-15T10:00:00Z"),
  "bucket_end": ISODate("2026-04-15T10:03:20Z"),
  "count": 200,
  "measurements": [
    { "ts": ISODate("2026-04-15T10:00:00Z"), "temp": 22.4, "humidity": 45 },
    { "ts": ISODate("2026-04-15T10:00:01Z"), "temp": 22.5, "humidity": 45 },
    // ... 198 more
  ],
  "stats": { "avg_temp": 22.6, "min_temp": 22.1, "max_temp": 23.0 }
}

Impact: 200× fewer documents, 200× smaller index, pre-computed stats avoid full scans.

Reference Extended Reference Pattern

Store a reference AND a copy of frequently-accessed fields. Avoids $lookup for common queries while maintaining a canonical source of truth.

// orders collection
{
  "_id": ObjectId("..."),
  "customer_id": ObjectId("cust_42"),     // Reference
  "customer_name": "Alice Chen",           // Cached copy
  "customer_email": "alice@example.com",   // Cached copy
  "items": [...]
}

// When Alice changes her name, update both:
// 1. customers collection (source of truth)
// 2. Run a background job to update cached copies in orders

Hybrid Polymorphic Pattern

Store documents with different shapes in the same collection, differentiated by a type field. Ideal for content management systems, notification systems, or any entity with subtypes.

// notifications collection — different shapes, same collection
{ "type": "email",    "to": "alice@ex.com", "subject": "Order shipped", "body": "..." }
{ "type": "sms",      "to": "+14155551234", "message": "Your order has shipped" }
{ "type": "push",     "device_token": "abc123", "title": "Shipped!", "badge": 3 }
{ "type": "webhook",  "url": "https://api.partner.com/hook", "payload": {...} }

Hybrid Outlier Pattern

Most books have 1–5 authors, but some anthologies have 50+. Embed authors for the common case, and overflow to a separate collection for outliers.

// Common case: authors embedded
{
  "title": "Designing Data-Intensive Applications",
  "authors": ["Martin Kleppmann"],
  "has_overflow_authors": false
}

// Outlier case: flag + separate collection
{
  "title": "Best Short Stories 2026",
  "authors": ["Author1", "Author2", ...first 20...],
  "has_overflow_authors": true  // Signal to app: also query book_authors collection
}

MongoDB Architecture

A production MongoDB deployment is a distributed system with three key components: replica sets for high availability, sharding for horizontal scaling, and the WiredTiger storage engine for on-disk management.

Replica Sets

A replica set is a group of mongod processes that maintain the same data set. Every replica set has exactly one primary that accepts all writes, and one or more secondaries that replicate the primary's oplog (operation log).

Component	Role	Details
Primary	Accepts all writes	Elected via Raft-like consensus. Only one primary per replica set at any time.
Secondary	Replicates from primary	Continuously tails the primary's oplog. Can serve reads if `readPreference` is configured.
Arbiter	Votes in elections only	Holds no data. Used to break ties in even-numbered replica sets. Use sparingly.
Hidden	Invisible to clients	Never receives read traffic. Used for analytics, reporting, or backups.
Delayed	Lags intentionally	Replicates with a configured delay (e.g., 1 hour). Protects against accidental deletes.

PSA

Primary + Secondary + Arbiter (minimum HA)

P2S

Primary + 2 Secondaries (recommended production)

<10s

Typical failover time (election + catch-up)

50 max

Members per replica set (7 voting max)

Write Concern controls durability guarantees:

// Write acknowledged by majority of voting members
db.orders.insertOne(
  { order_number: "ORD-2026-78432", ... },
  { writeConcern: { w: "majority", j: true, wtimeout: 5000 } }
)

// w: "majority"  → Wait for ack from ⌈n/2⌉+1 members
// j: true        → Wait for journal commit (survives crash)
// wtimeout: 5000 → Fail after 5 seconds if majority hasn't acked

Read Preference controls where reads go:

Mode	Reads From	Use Case
`primary`	Primary only	Default. Strong consistency.
`primaryPreferred`	Primary; secondary if unavailable	HA with consistency preference
`secondary`	Secondaries only	Offload reads; accept stale data
`secondaryPreferred`	Secondaries; primary if none available	Scale reads with fallback
`nearest`	Lowest-latency member	Geo-distributed deployments

Sharding

When a single replica set can't handle the data volume or throughput, MongoDB distributes data across shards — each shard is itself a replica set. Three components make up a sharded cluster:

mongos (Query Router)

Stateless router process. Applications connect to mongos instead of individual shards. Routes queries to the correct shard(s) based on the shard key. You run multiple mongos instances behind a load balancer for HA.

Config Servers

A 3-member replica set storing cluster metadata: shard key ranges (chunks), which shard owns which chunk, and balancer state. Every mongos caches this metadata.

MongoDB Sharding Flow

Click "Step" to trace a query through a sharded cluster.

Shard Key Selection

The shard key determines how documents are distributed across shards. It's the most critical decision in a sharded deployment because it's immutable after creation (MongoDB 5.0 added reshardCollection, but it's expensive).

Shard Key Strategy	Example	Pros	Cons
Hashed	`{ _id: "hashed" }`	Even distribution, no hotspots	Range queries scatter to all shards
Range	`{ created_at: 1 }`	Efficient range scans	Monotonic keys → all writes to one shard (hotspot)
Compound	`{ tenant_id: 1, _id: 1 }`	Targeted queries + good distribution	Requires knowing query patterns upfront
Zone	Assign ranges to geo zones	Data locality (GDPR compliance)	Operational complexity

// Shard the orders collection by hashed customer_id
sh.shardCollection("ecommerce.orders", { "customer.id": "hashed" })

// Better: compound key for tenant isolation + good distribution
sh.shardCollection("saas.events", { "tenant_id": 1, "timestamp": 1 })

// Check chunk distribution
db.orders.getShardDistribution()
// Output:
// Shard shard-0: 33.2% data, 28.1 GB, 12.4M documents
// Shard shard-1: 33.5% data, 28.4 GB, 12.5M documents
// Shard shard-2: 33.3% data, 28.2 GB, 12.4M documents

Anti-pattern: Monotonic shard key. Using { _id: 1 } (range-based on ObjectId) creates a massive hotspot because ObjectId is roughly time-ordered — all new inserts go to the shard owning the "max" chunk. Use { _id: "hashed" } instead, or a compound key with a high-cardinality prefix.

Targeted vs. Scatter-Gather Queries

// TARGETED: Query includes the shard key → mongos routes to 1 shard
db.orders.find({ "customer.id": 42, status: "shipped" })
// mongos → hash(42) → Shard-1 → return results. Fast.

// SCATTER-GATHER: Query does NOT include shard key → hits ALL shards
db.orders.find({ status: "shipped", "items.sku": "KB-MK850" })
// mongos → Shard-0 + Shard-1 + Shard-2 → merge results. Slow.

WiredTiger Storage Engine

Since MongoDB 3.2, WiredTiger is the default storage engine. It replaced MMAPv1 and brought massive improvements:

Feature	WiredTiger	Old MMAPv1
Locking granularity	Document-level	Collection-level
Compression	Snappy (default), zlib, zstd	None
Concurrency control	MVCC (Multi-Version)	Reader-writer locks
Checkpoint interval	60 seconds (configurable)	60 seconds
Journaling	Write-ahead log (WAL)	Group commit journal
Cache	50% of RAM - 1 GB (default)	All available memory
Data structures	B-tree (default) + LSM-tree (optional)	B-tree only

How a write flows through WiredTiger:

Application sends an insertOne() to the mongod process.
WiredTiger writes the operation to the journal (WAL) — this ensures durability even if the process crashes before the next checkpoint.
The document is written to the in-memory B-tree (WiredTiger cache).
Indexes are updated in their respective in-memory B-trees.
Every 60 seconds, a checkpoint flushes dirty pages from memory to the data files on disk.
Old journal entries (before the last checkpoint) are purged.

Compression savings in practice: With zstd compression (MongoDB 4.2+), a 100 GB uncompressed dataset typically compresses to 30–40 GB on disk. JSON-heavy documents compress particularly well because of repeated field names across documents.

Querying & Aggregation

MongoDB's query language is expressive enough for most OLTP workloads and many analytics tasks. Let's walk through the full spectrum from basic CRUD to complex aggregations.

CRUD Operations

// CREATE
db.orders.insertOne({ order_number: "ORD-2026-78433", customer: { id: 43 }, ... })
db.orders.insertMany([{ ... }, { ... }, { ... }])  // Bulk insert

// READ — rich query operators
db.orders.find({
  "customer.id": 42,                          // Dot notation for nested fields
  status: { $in: ["shipped", "delivered"] },   // $in operator
  "payment.total_cents": { $gte: 5000 },       // Range query
  tags: { $all: ["repeat-customer"] }           // Array contains all
}).sort({ created_at: -1 })
  .limit(20)
  .projection({ items: 1, status: 1, _id: 0 }) // Return only these fields

// UPDATE — atomic field-level updates
db.orders.updateOne(
  { _id: ObjectId("665a1b2c3d4e5f6a7b8c9d0e") },
  {
    $set: { status: "delivered", "shipping.delivered_at": new Date() },
    $push: { tags: "completed" },
    $inc: { "customer.order_count": 1 },
    $currentDate: { updated_at: true }
  }
)

// UPDATE — array operations
db.orders.updateOne(
  { _id: ObjectId("..."), "items.sku": "KB-MK850" },
  { $set: { "items.$.quantity": 3 } }          // $ positional operator
)

// DELETE
db.orders.deleteMany({ status: "cancelled", created_at: { $lt: ISODate("2025-01-01") } })

Aggregation Pipeline

The aggregation pipeline is MongoDB's answer to SQL's GROUP BY, HAVING, window functions, and subqueries — but expressed as a sequence of stages that documents flow through, each transforming the result set.

// Revenue by product category for Q1 2026, with average order value
db.orders.aggregate([
  // Stage 1: Filter to Q1 2026 shipped/delivered orders
  { $match: {
      created_at: {
        $gte: ISODate("2026-01-01"),
        $lt: ISODate("2026-04-01")
      },
      status: { $in: ["shipped", "delivered"] }
  }},

  // Stage 2: Unwind the items array (1 doc per item)
  { $unwind: "$items" },

  // Stage 3: Group by category
  { $group: {
      _id: "$items.category",
      total_revenue: { $sum: { $multiply: ["$items.quantity", "$items.unit_price_cents"] } },
      total_units: { $sum: "$items.quantity" },
      order_count: { $addToSet: "$_id" },  // Unique orders
      avg_item_price: { $avg: "$items.unit_price_cents" }
  }},

  // Stage 4: Add computed fields
  { $addFields: {
      unique_orders: { $size: "$order_count" },
      total_revenue_dollars: { $divide: ["$total_revenue", 100] }
  }},

  // Stage 5: Remove the large order_count array
  { $project: { order_count: 0 } },

  // Stage 6: Sort by revenue descending
  { $sort: { total_revenue: -1 } },

  // Stage 7: Limit to top 10
  { $limit: 10 }
])

// Result:
// { _id: "electronics",  total_revenue: 4589200, total_units: 312, ... }
// { _id: "accessories",  total_revenue: 892100,  total_units: 1205, ... }
// { _id: "clothing",     total_revenue: 445600,  total_units: 892, ... }

Key Aggregation Stages

Stage	SQL Equivalent	What It Does
`$match`	`WHERE / HAVING`	Filters documents. Place early to reduce pipeline volume.
`$group`	`GROUP BY`	Groups docs by key and applies accumulators ($sum, $avg, $max, etc.)
`$project`	`SELECT`	Include, exclude, or reshape fields.
`$unwind`	`LATERAL FLATTEN`	Deconstructs an array field into one doc per element.
`$lookup`	`LEFT OUTER JOIN`	Joins another collection. Performs a correlated subquery.
`$sort`	`ORDER BY`	Sorts results. Uses indexes if placed at the start.
`$facet`	Multiple `GROUP BY`s	Run multiple pipelines in parallel on the same input.
`$bucket`	`CASE WHEN` grouping	Groups into custom ranges (price bands, age ranges).
`$graphLookup`	Recursive CTE	Recursive self-lookup (org charts, category trees).
`$merge`	`INSERT INTO ... SELECT`	Write pipeline results to another collection.

Performance tip: Always place $match and $sort stages as early as possible. If $match is the first stage and uses an indexed field, MongoDB uses the index to avoid a full collection scan. A pipeline that starts with $unwind on a million-document collection is a guaranteed performance disaster.

$lookup — Server-Side Joins

// Enrich orders with full product details from a separate "products" collection
db.orders.aggregate([
  { $match: { "customer.id": 42 } },
  { $unwind: "$items" },
  { $lookup: {
      from: "products",
      localField: "items.sku",
      foreignField: "sku",
      as: "product_details"
  }},
  { $unwind: "$product_details" },
  { $project: {
      order_number: 1,
      "items.sku": 1,
      "items.quantity": 1,
      "product_details.name": 1,
      "product_details.price_cents": 1,
      "product_details.specs": 1
  }}
])

Change Streams

Change streams let you subscribe to real-time data changes on a collection, database, or entire deployment. They're built on top of the oplog and provide a resumable, ordered stream of change events.

// Watch for new orders and status changes
const pipeline = [
  { $match: {
      $or: [
        { operationType: "insert" },
        { operationType: "update", "updateDescription.updatedFields.status": { $exists: true } }
      ]
  }}
];

const changeStream = db.orders.watch(pipeline, {
  fullDocument: "updateLookup"  // Include the full document on updates
});

changeStream.on("change", (event) => {
  console.log(`${event.operationType}: Order ${event.fullDocument.order_number}`);
  console.log(`Status: ${event.fullDocument.status}`);

  // Trigger downstream actions
  if (event.fullDocument.status === "shipped") {
    sendShippingNotification(event.fullDocument);
  }
});

// Resume after disconnect using a resume token
const resumeToken = changeStream.resumeToken;
// Later:
const resumed = db.orders.watch(pipeline, { resumeAfter: resumeToken });

Use cases for change streams: Real-time dashboards, event-driven microservices (order → payment → shipping), cache invalidation, audit logging, synchronizing data to Elasticsearch or a data warehouse, and CDC (Change Data Capture) pipelines.

Indexes

Without indexes, every query performs a collection scan (COLLSCAN) — examining every document. With millions of documents, that's unacceptable. MongoDB supports a rich set of index types:

Index Type	Syntax	Use Case
Single field	`{ email: 1 }`	Equality and range queries on one field
Compound	`{ status: 1, created_at: -1 }`	Multi-field queries. Order matters (ESR rule).
Multikey	`{ tags: 1 }`	Indexes each element in an array field
Text	`{ name: "text", description: "text" }`	Full-text search with stemming and scoring
Geospatial (2dsphere)	`{ location: "2dsphere" }`	$near, $geoWithin queries on GeoJSON
Hashed	`{ user_id: "hashed" }`	Equality only. Used as shard key for even distribution.
TTL	`{ expires_at: 1 }, { expireAfterSeconds: 0 }`	Auto-delete expired documents (sessions, temp data)
Wildcard	`{ "specs.$**": 1 }`	Index all sub-fields of a polymorphic field
Partial	`{ status: 1 }, { partialFilterExpression: ... }`	Index only a subset of documents (smaller index)

The ESR Rule (Equality → Sort → Range)

For compound indexes, order fields by: Equality fields first, then Sort fields, then Range fields. This gives the query planner the best chance to satisfy the query with an index scan and an in-order traversal.

// Query pattern: find active orders for a customer, sorted by date
db.orders.find({
  "customer.id": 42,           // Equality
  status: "active",            // Equality
  created_at: { $gte: ISODate("2026-01-01") }  // Range
}).sort({ created_at: -1 })    // Sort

// Optimal compound index (ESR):
db.orders.createIndex({
  "customer.id": 1,   // E — equality
  status: 1,          // E — equality
  created_at: -1      // S+R — sort direction matches, also serves range
})

// Verify with explain():
db.orders.find({ ... }).explain("executionStats")
// Look for:
//   "stage": "IXSCAN"          ← Good (not COLLSCAN)
//   "nReturned": 47            ← Documents returned
//   "totalKeysExamined": 47    ← Keys examined (ideal: same as nReturned)
//   "totalDocsExamined": 47    ← Docs examined (covered query: 0)

Text Indexes

// Create a text index for product search
db.products.createIndex(
  { name: "text", description: "text", tags: "text" },
  { weights: { name: 10, tags: 5, description: 1 },  // Name matches score 10x
    default_language: "english" }
)

// Search
db.products.find(
  { $text: { $search: "mechanical keyboard wireless" } },
  { score: { $meta: "textScore" } }
).sort({ score: { $meta: "textScore" } }).limit(20)

Geospatial Indexes

// Store locations as GeoJSON
db.restaurants.insertOne({
  name: "MongoDB Café",
  location: {
    type: "Point",
    coordinates: [-122.4194, 37.7749]  // [longitude, latitude]
  },
  cuisine: "american"
})

// Create 2dsphere index
db.restaurants.createIndex({ location: "2dsphere" })

// Find restaurants within 2km of a point
db.restaurants.find({
  location: {
    $near: {
      $geometry: { type: "Point", coordinates: [-122.4194, 37.7749] },
      $maxDistance: 2000  // meters
    }
  }
})

// Find restaurants inside a polygon (delivery zone)
db.restaurants.find({
  location: {
    $geoWithin: {
      $geometry: {
        type: "Polygon",
        coordinates: [[
          [-122.43, 37.78], [-122.41, 37.78],
          [-122.41, 37.77], [-122.43, 37.77],
          [-122.43, 37.78]
        ]]
      }
    }
  }
})

Index Sizing

Rule of thumb: Your working set of indexes should fit in RAM. If indexes exceed available memory, every query that needs a non-cached index page causes a disk I/O, destroying performance. Check with db.orders.stats().totalIndexSize — if this exceeds your WiredTiger cache allocation, you need bigger instances or fewer/smarter indexes.

// Check index sizes for a collection
db.orders.stats().indexSizes
// {
//   "_id_":                    2.1 GB,
//   "customer.id_1_status_1":  1.8 GB,
//   "created_at_-1":           0.9 GB,
//   "items.sku_1":             3.2 GB   ← Multikey index on array = large
// }
// Total: 8.0 GB → Need at least 8 GB of WiredTiger cache for index-only

// List indexes and their usage stats
db.orders.aggregate([{ $indexStats: {} }])
// Shows ops count per index — drop indexes with 0 ops since last restart

Transactions in MongoDB 4.0+

Before MongoDB 4.0, atomicity was limited to single-document operations — which is often sufficient since documents can embed related data. But some workflows genuinely require multi-document, multi-collection ACID transactions. MongoDB 4.0 added support for multi-document transactions on replica sets, and 4.2 extended them to sharded clusters.

// Transfer credits between two user accounts (multi-document transaction)
const session = db.getMongo().startSession();
session.startTransaction({
  readConcern: { level: "snapshot" },
  writeConcern: { w: "majority" },
  readPreference: "primary"
});

try {
  const accounts = session.getDatabase("bank").accounts;

  // Debit sender
  const sender = accounts.findOneAndUpdate(
    { _id: "alice", balance: { $gte: 5000 } },
    { $inc: { balance: -5000 }, $push: { transactions: {
        type: "debit", amount: 5000, to: "bob", ts: new Date()
    }}},
    { session, returnDocument: "after" }
  );

  if (!sender) throw new Error("Insufficient funds");

  // Credit receiver
  accounts.updateOne(
    { _id: "bob" },
    { $inc: { balance: 5000 }, $push: { transactions: {
        type: "credit", amount: 5000, from: "alice", ts: new Date()
    }}},
    { session }
  );

  // Insert audit log (different collection)
  session.getDatabase("bank").audit_log.insertOne({
    action: "transfer",
    from: "alice", to: "bob", amount: 5000,
    timestamp: new Date()
  }, { session });

  session.commitTransaction();
  print("Transfer committed successfully");

} catch (error) {
  session.abortTransaction();
  print("Transfer aborted: " + error.message);

} finally {
  session.endSession();
}

Transaction performance caveats:

60-second lifetime limit (default transactionLifetimeLimitSeconds). Long-running transactions are killed.
1,000 documents modified per transaction is a practical limit before performance degrades.
Write conflicts: If two transactions modify the same document, one gets a WriteConflict error and must retry.
Oplog entries: Each transaction writes a single oplog entry — large transactions create large oplog entries, slowing replication.
Not a replacement for good schema design. If you're using transactions for every operation, you've likely modeled your data wrong. Embedding related data in a single document eliminates most transaction needs.

Retryable Writes & Reads

// MongoDB drivers retry transient errors automatically (since 4.2+)
const client = new MongoClient(uri, {
  retryWrites: true,   // Auto-retry on network errors, elections
  retryReads: true,     // Auto-retry reads
  w: "majority",
  readPreference: "primaryPreferred"
});

// For transactions, implement your own retry loop:
async function runTransactionWithRetry(session, txnFunc) {
  while (true) {
    try {
      await txnFunc(session);
      break;
    } catch (error) {
      if (error.hasErrorLabel("TransientTransactionError")) {
        console.log("Transient error, retrying...");
        continue;
      }
      throw error;
    }
  }
}

Real-World Sizing

Let's size a MongoDB deployment for a mid-scale e-commerce platform: 10 million orders, 100,000 products, 2 million users, with 5,000 writes/sec peak and 20,000 reads/sec peak.

Document Size Estimation

Collection	Avg Doc Size	Doc Count	Raw Data	With zstd (~3x)	Indexes
orders	2.5 KB	10M	25 GB	~8.3 GB	~6 GB (4 indexes)
products	4 KB	100K	0.4 GB	~0.13 GB	~0.2 GB
users	1.5 KB	2M	3 GB	~1 GB	~1.2 GB
events	0.5 KB	500M	250 GB	~83 GB	~40 GB (TTL + compound)
Total			278.4 GB	~92.4 GB	~47.4 GB

Deployment Recommendation

3-Shard

Cluster (each shard = 3-member RS)

64 GB

RAM per shard (32 GB WiredTiger cache)

500 GB

NVMe SSD per shard

3×mongos

Behind ALB for HA

// Working set must fit in WiredTiger cache:
// Hot data: recent 30 days of orders (~2.5M docs × 2.5 KB = 6.25 GB)
//         + all products (0.4 GB)
//         + active users (0.5 GB)
//         + hot indexes (~15 GB)
// Total working set ≈ 22 GB → Fits in 32 GB cache ✓

// Throughput: 5,000 writes/sec ÷ 3 shards = ~1,667 writes/shard
// Single replica set can handle ~10,000-20,000 writes/sec on NVMe
// Headroom: 83-90% → Healthy ✓

Connection pooling: Each mongos maintains a connection pool to every shard. With 3 app servers × 100 connections/server × 3 mongos instances, you get 900 connections. Each shard handles 300. MongoDB default max is 65,536 connections per mongod, but practical limits are lower (each connection ≈ 1 MB memory). Use connection pooling libraries with maxPoolSize: 100.

When to Use Document Stores

✅ Great Fit

Product catalogs — varying attributes per category
Content management — articles with embedded media, tags, comments
User profiles — flexible preferences, nested settings
Event sourcing — append-heavy, polymorphic events
Real-time analytics — aggregation pipeline + change streams
Mobile/IoT backends — MongoDB Realm sync, flexible schemas for evolving apps
Rapid prototyping — no schema migrations, fast iteration
Multi-tenant SaaS — shard by tenant_id, tenant-specific schemas

❌ Poor Fit

Financial ledgers — need strict ACID across many entities
Heavy relational joins — 5+ table JOINs are SQL territory
Highly normalized data — if you need 3NF, use a relational DB
Complex graph traversals — use Neo4j or Neptune
Write-heavy time-series — Cassandra or TimescaleDB may be better
Sub-millisecond caching — Redis is purpose-built
Full OLAP workloads — use ClickHouse, BigQuery, or Snowflake
Strong schema enforcement — SQL's DDL is still more rigorous

MongoDB vs. Other Document Stores

Database	Strengths	Weaknesses	Best For
MongoDB	Rich queries, aggregation, transactions, ecosystem	Memory-hungry, SSPL license	General-purpose document workloads
CouchDB	Master-master replication, HTTP API, offline-first	Slower queries, limited indexing	Offline-capable mobile/edge apps
Firestore	Serverless, real-time sync, Firebase integration	Limited queries, vendor lock-in	Mobile apps, small-medium scale
Amazon DocumentDB	MongoDB-compatible, managed, integrated with AWS	Not 100% compatible, slower feature adoption	AWS-native MongoDB workloads
CosmosDB	Multi-model, global distribution, SLA-backed	Expensive at scale, RU pricing complexity	Azure-native, multi-region apps

Final decision framework: Choose MongoDB when (1) your data is naturally hierarchical or polymorphic, (2) you need horizontal scaling with rich queries, (3) your access patterns are document-centric (fetch/update one entity at a time), and (4) you can tolerate some denormalization. For everything else, pick the more specialized tool — SQL for complex joins, Redis for caching, Cassandra for massive write throughput, or a graph DB for relationship-heavy data.