← All Posts
High Level Design Series · Data Storage · Part 2

Document Stores (MongoDB)

Relational databases enforce rigid schemas — every row in a table must have the same columns. This works beautifully when your data is uniform and relationships are well-understood upfront. But modern applications frequently deal with polymorphic data — product catalogs where a laptop has different attributes than a t-shirt, user profiles with optional nested preferences, or event payloads that vary by type. Document stores were purpose-built for this world.

MongoDB is the most widely deployed document database, running in production at companies like Google, eBay, Toyota, Forbes, and Coinbase. It stores data as rich JSON-like documents, supports secondary indexes, horizontal sharding, ACID transactions (since 4.0), and a powerful aggregation framework. This post is a deep-dive into how it all works under the hood.

The Document Model

In a document database, the fundamental unit of storage is a document — a self-describing, hierarchical data structure similar to a JSON object. Documents are grouped into collections (the rough equivalent of SQL tables), and collections live inside databases.

A MongoDB Document — E-Commerce Order

{
  "_id": ObjectId("665a1b2c3d4e5f6a7b8c9d0e"),
  "order_number": "ORD-2026-78432",
  "customer": {
    "id": 42,
    "name": "Alice Chen",
    "email": "alice@example.com",
    "tier": "gold"
  },
  "status": "shipped",
  "items": [
    {
      "sku": "KB-MK850",
      "name": "Mechanical Keyboard",
      "quantity": 1,
      "unit_price_cents": 12990,
      "category": "electronics"
    },
    {
      "sku": "CBL-USBC-2M",
      "name": "USB-C Cable 2m",
      "quantity": 2,
      "unit_price_cents": 1500,
      "category": "accessories"
    }
  ],
  "shipping": {
    "address": {
      "street": "123 Market St",
      "city": "San Francisco",
      "state": "CA",
      "zip": "94105"
    },
    "carrier": "USPS",
    "tracking_number": "9400111899223100012345",
    "shipped_at": ISODate("2026-04-16T14:22:00Z")
  },
  "payment": {
    "method": "credit_card",
    "last_four": "4242",
    "total_cents": 15990,
    "currency": "USD"
  },
  "tags": ["repeat-customer", "expedited"],
  "created_at": ISODate("2026-04-15T10:30:00Z"),
  "updated_at": ISODate("2026-04-16T14:22:00Z")
}

This single document contains everything you need to display an order detail page — customer info, line items, shipping, and payment. In a relational database, this same data would span 5–6 normalized tables (users, orders, order_items, addresses, payments) requiring multiple JOINs to reconstruct.

Document vs Relational Model

Click "Step" to compare how each model stores and queries an order.

BSON: Binary JSON

MongoDB doesn't actually store JSON — it stores BSON (Binary JSON), a binary-encoded superset of JSON that adds types JSON lacks:

BSON TypeSizeJSON EquivalentWhy It Matters
ObjectId12 bytesString (24 hex chars)Globally unique, contains timestamp — no UUID coordination needed
Date8 bytesString (ISO-8601)Efficient range queries, proper sorting, timezone-aware
Decimal12816 bytesNumber (float64)Exact decimal arithmetic — critical for financial data
Int32 / Int644 / 8 bytesNumberDistinguishes integer from floating-point
BinaryVariableBase64 stringStore binary blobs (images, hashes) natively
RegexVariableStringServer-side regex matching without string parsing
Timestamp8 bytesNoneInternal — used by the oplog for replication ordering

BSON's key advantage is traversability. Each field is prefixed with its type and length, allowing the storage engine to skip over fields it doesn't need without parsing the entire document. This makes indexed field extraction extremely fast.

ObjectId anatomy (12 bytes): [4-byte timestamp][5-byte random][3-byte counter]. The embedded timestamp means _id is roughly chronologically sorted — you get time-ordered inserts for free. Extract the timestamp: ObjectId("665a1b2c...").getTimestamp()2026-05-31T...

BSON document size limit: 16 MB. This is a hard cap. If a single document could grow unbounded (e.g., an array of millions of comments), you must use a different modeling strategy (bucketing or referencing). For truly large binary data, use GridFS, which chunks files into 255 KB pieces across two collections (fs.files + fs.chunks).

Flexible Schema

In SQL, adding a column requires ALTER TABLE — a DDL operation that may lock the table. In MongoDB, documents in the same collection can have completely different fields:

// Document 1: A laptop product
{
  "_id": ObjectId("..."),
  "type": "laptop",
  "name": "ThinkPad X1 Carbon Gen 12",
  "specs": {
    "cpu": "Intel Core Ultra 7 155H",
    "ram_gb": 32,
    "storage_gb": 1024,
    "display": "14\" 2.8K OLED",
    "battery_wh": 57
  },
  "price_cents": 164900
}

// Document 2: A t-shirt (same collection!)
{
  "_id": ObjectId("..."),
  "type": "clothing",
  "name": "MongoDB Developer Tee",
  "specs": {
    "material": "100% organic cotton",
    "sizes_available": ["S", "M", "L", "XL"],
    "color": "forest green"
  },
  "price_cents": 2999
}

Both documents live in a products collection, but their specs fields are completely different. A relational database would need either a single wide table with many NULL columns (entity-attribute-value antipattern) or multiple product-type tables.

Schema validation: "Flexible" doesn't mean "no rules." MongoDB supports JSON Schema validation at the collection level. In production, you should always define validation rules:
db.createCollection("products", {
  validator: {
    $jsonSchema: {
      bsonType: "object",
      required: ["name", "type", "price_cents"],
      properties: {
        name: { bsonType: "string" },
        type: { enum: ["laptop", "clothing", "accessory"] },
        price_cents: { bsonType: "int", minimum: 0 }
      }
    }
  }
})

Embedding vs Referencing

This is the single most important design decision in document modeling — analogous to normalization vs. denormalization in SQL. Every relationship between entities must be modeled as either an embed (nested sub-document) or a reference (foreign key stored as an ObjectId).

✦ Embedding (Denormalized)

Store related data inside the parent document.

// Order with embedded items
{
  "_id": ObjectId("..."),
  "items": [
    { "sku": "KB-MK850", "qty": 1 },
    { "sku": "CBL-USBC", "qty": 2 }
  ]
}

Pros: Single read, atomic writes, data locality
Cons: Data duplication, 16 MB limit, unbounded arrays

✦ Referencing (Normalized)

Store an ObjectId pointing to another collection.

// Order with referenced items
{
  "_id": ObjectId("..."),
  "item_ids": [
    ObjectId("aaa..."),
    ObjectId("bbb...")
  ]
}
// Separate "items" collection
{ "_id": ObjectId("aaa..."), "sku": "KB-MK850" }

Pros: No duplication, no size limit, independent updates
Cons: Requires $lookup (JOIN), multiple round trips

Decision Matrix

FactorEmbedReference
Read patternAlways read togetherRead independently or rarely together
Write patternUpdated atomicallyUpdated independently by different services
Cardinality1:few, 1:many (bounded)1:many (unbounded), many:many
Data sizeChild data is smallChild documents are large
Duplication toleranceOK (denormalized)Not acceptable
Document growthPredictable / boundedUnbounded array growth

Schema Design Patterns

Embed Subset Pattern

Store a subset of frequently-accessed related data in the parent, and the full data in its own collection. Example: A movie document embeds the top 20 reviews, while all 50,000 reviews live in a reviews collection.

// movies collection — embed only recent reviews
{
  "_id": ObjectId("..."),
  "title": "Interstellar",
  "recent_reviews": [
    { "user": "alice", "rating": 5, "text": "Masterpiece", "date": ISODate("2026-04-01") },
    // ... top 20 only
  ],
  "review_count": 48723
}

// reviews collection — full data
{ "_id": ObjectId("..."), "movie_id": ObjectId("..."), "user": "alice", ... }

Embed Bucket Pattern

Group time-series data into fixed-size "buckets" to reduce document count and index size. Instead of one document per sensor reading, store 200 readings per bucket.

// sensor_readings collection — 200 readings per document
{
  "_id": ObjectId("..."),
  "sensor_id": "temp-floor-3",
  "bucket_start": ISODate("2026-04-15T10:00:00Z"),
  "bucket_end": ISODate("2026-04-15T10:03:20Z"),
  "count": 200,
  "measurements": [
    { "ts": ISODate("2026-04-15T10:00:00Z"), "temp": 22.4, "humidity": 45 },
    { "ts": ISODate("2026-04-15T10:00:01Z"), "temp": 22.5, "humidity": 45 },
    // ... 198 more
  ],
  "stats": { "avg_temp": 22.6, "min_temp": 22.1, "max_temp": 23.0 }
}

Impact: 200× fewer documents, 200× smaller index, pre-computed stats avoid full scans.

Reference Extended Reference Pattern

Store a reference AND a copy of frequently-accessed fields. Avoids $lookup for common queries while maintaining a canonical source of truth.

// orders collection
{
  "_id": ObjectId("..."),
  "customer_id": ObjectId("cust_42"),     // Reference
  "customer_name": "Alice Chen",           // Cached copy
  "customer_email": "alice@example.com",   // Cached copy
  "items": [...]
}

// When Alice changes her name, update both:
// 1. customers collection (source of truth)
// 2. Run a background job to update cached copies in orders

Hybrid Polymorphic Pattern

Store documents with different shapes in the same collection, differentiated by a type field. Ideal for content management systems, notification systems, or any entity with subtypes.

// notifications collection — different shapes, same collection
{ "type": "email",    "to": "alice@ex.com", "subject": "Order shipped", "body": "..." }
{ "type": "sms",      "to": "+14155551234", "message": "Your order has shipped" }
{ "type": "push",     "device_token": "abc123", "title": "Shipped!", "badge": 3 }
{ "type": "webhook",  "url": "https://api.partner.com/hook", "payload": {...} }

Hybrid Outlier Pattern

Most books have 1–5 authors, but some anthologies have 50+. Embed authors for the common case, and overflow to a separate collection for outliers.

// Common case: authors embedded
{
  "title": "Designing Data-Intensive Applications",
  "authors": ["Martin Kleppmann"],
  "has_overflow_authors": false
}

// Outlier case: flag + separate collection
{
  "title": "Best Short Stories 2026",
  "authors": ["Author1", "Author2", ...first 20...],
  "has_overflow_authors": true  // Signal to app: also query book_authors collection
}

MongoDB Architecture

A production MongoDB deployment is a distributed system with three key components: replica sets for high availability, sharding for horizontal scaling, and the WiredTiger storage engine for on-disk management.

Replica Sets

A replica set is a group of mongod processes that maintain the same data set. Every replica set has exactly one primary that accepts all writes, and one or more secondaries that replicate the primary's oplog (operation log).

ComponentRoleDetails
PrimaryAccepts all writesElected via Raft-like consensus. Only one primary per replica set at any time.
SecondaryReplicates from primaryContinuously tails the primary's oplog. Can serve reads if readPreference is configured.
ArbiterVotes in elections onlyHolds no data. Used to break ties in even-numbered replica sets. Use sparingly.
HiddenInvisible to clientsNever receives read traffic. Used for analytics, reporting, or backups.
DelayedLags intentionallyReplicates with a configured delay (e.g., 1 hour). Protects against accidental deletes.
PSA
Primary + Secondary + Arbiter (minimum HA)
P2S
Primary + 2 Secondaries (recommended production)
<10s
Typical failover time (election + catch-up)
50 max
Members per replica set (7 voting max)

Write Concern controls durability guarantees:

// Write acknowledged by majority of voting members
db.orders.insertOne(
  { order_number: "ORD-2026-78432", ... },
  { writeConcern: { w: "majority", j: true, wtimeout: 5000 } }
)

// w: "majority"  → Wait for ack from ⌈n/2⌉+1 members
// j: true        → Wait for journal commit (survives crash)
// wtimeout: 5000 → Fail after 5 seconds if majority hasn't acked

Read Preference controls where reads go:

ModeReads FromUse Case
primaryPrimary onlyDefault. Strong consistency.
primaryPreferredPrimary; secondary if unavailableHA with consistency preference
secondarySecondaries onlyOffload reads; accept stale data
secondaryPreferredSecondaries; primary if none availableScale reads with fallback
nearestLowest-latency memberGeo-distributed deployments

Sharding

When a single replica set can't handle the data volume or throughput, MongoDB distributes data across shards — each shard is itself a replica set. Three components make up a sharded cluster:

mongos (Query Router)

Stateless router process. Applications connect to mongos instead of individual shards. Routes queries to the correct shard(s) based on the shard key. You run multiple mongos instances behind a load balancer for HA.

Config Servers

A 3-member replica set storing cluster metadata: shard key ranges (chunks), which shard owns which chunk, and balancer state. Every mongos caches this metadata.

MongoDB Sharding Flow

Click "Step" to trace a query through a sharded cluster.

Shard Key Selection

The shard key determines how documents are distributed across shards. It's the most critical decision in a sharded deployment because it's immutable after creation (MongoDB 5.0 added reshardCollection, but it's expensive).

Shard Key StrategyExampleProsCons
Hashed{ _id: "hashed" }Even distribution, no hotspotsRange queries scatter to all shards
Range{ created_at: 1 }Efficient range scansMonotonic keys → all writes to one shard (hotspot)
Compound{ tenant_id: 1, _id: 1 }Targeted queries + good distributionRequires knowing query patterns upfront
ZoneAssign ranges to geo zonesData locality (GDPR compliance)Operational complexity
// Shard the orders collection by hashed customer_id
sh.shardCollection("ecommerce.orders", { "customer.id": "hashed" })

// Better: compound key for tenant isolation + good distribution
sh.shardCollection("saas.events", { "tenant_id": 1, "timestamp": 1 })

// Check chunk distribution
db.orders.getShardDistribution()
// Output:
// Shard shard-0: 33.2% data, 28.1 GB, 12.4M documents
// Shard shard-1: 33.5% data, 28.4 GB, 12.5M documents
// Shard shard-2: 33.3% data, 28.2 GB, 12.4M documents
Anti-pattern: Monotonic shard key. Using { _id: 1 } (range-based on ObjectId) creates a massive hotspot because ObjectId is roughly time-ordered — all new inserts go to the shard owning the "max" chunk. Use { _id: "hashed" } instead, or a compound key with a high-cardinality prefix.

Targeted vs. Scatter-Gather Queries

// TARGETED: Query includes the shard key → mongos routes to 1 shard
db.orders.find({ "customer.id": 42, status: "shipped" })
// mongos → hash(42) → Shard-1 → return results. Fast.

// SCATTER-GATHER: Query does NOT include shard key → hits ALL shards
db.orders.find({ status: "shipped", "items.sku": "KB-MK850" })
// mongos → Shard-0 + Shard-1 + Shard-2 → merge results. Slow.

WiredTiger Storage Engine

Since MongoDB 3.2, WiredTiger is the default storage engine. It replaced MMAPv1 and brought massive improvements:

FeatureWiredTigerOld MMAPv1
Locking granularityDocument-levelCollection-level
CompressionSnappy (default), zlib, zstdNone
Concurrency controlMVCC (Multi-Version)Reader-writer locks
Checkpoint interval60 seconds (configurable)60 seconds
JournalingWrite-ahead log (WAL)Group commit journal
Cache50% of RAM - 1 GB (default)All available memory
Data structuresB-tree (default) + LSM-tree (optional)B-tree only

How a write flows through WiredTiger:

  1. Application sends an insertOne() to the mongod process.
  2. WiredTiger writes the operation to the journal (WAL) — this ensures durability even if the process crashes before the next checkpoint.
  3. The document is written to the in-memory B-tree (WiredTiger cache).
  4. Indexes are updated in their respective in-memory B-trees.
  5. Every 60 seconds, a checkpoint flushes dirty pages from memory to the data files on disk.
  6. Old journal entries (before the last checkpoint) are purged.
Compression savings in practice: With zstd compression (MongoDB 4.2+), a 100 GB uncompressed dataset typically compresses to 30–40 GB on disk. JSON-heavy documents compress particularly well because of repeated field names across documents.

Querying & Aggregation

MongoDB's query language is expressive enough for most OLTP workloads and many analytics tasks. Let's walk through the full spectrum from basic CRUD to complex aggregations.

CRUD Operations

// CREATE
db.orders.insertOne({ order_number: "ORD-2026-78433", customer: { id: 43 }, ... })
db.orders.insertMany([{ ... }, { ... }, { ... }])  // Bulk insert

// READ — rich query operators
db.orders.find({
  "customer.id": 42,                          // Dot notation for nested fields
  status: { $in: ["shipped", "delivered"] },   // $in operator
  "payment.total_cents": { $gte: 5000 },       // Range query
  tags: { $all: ["repeat-customer"] }           // Array contains all
}).sort({ created_at: -1 })
  .limit(20)
  .projection({ items: 1, status: 1, _id: 0 }) // Return only these fields

// UPDATE — atomic field-level updates
db.orders.updateOne(
  { _id: ObjectId("665a1b2c3d4e5f6a7b8c9d0e") },
  {
    $set: { status: "delivered", "shipping.delivered_at": new Date() },
    $push: { tags: "completed" },
    $inc: { "customer.order_count": 1 },
    $currentDate: { updated_at: true }
  }
)

// UPDATE — array operations
db.orders.updateOne(
  { _id: ObjectId("..."), "items.sku": "KB-MK850" },
  { $set: { "items.$.quantity": 3 } }          // $ positional operator
)

// DELETE
db.orders.deleteMany({ status: "cancelled", created_at: { $lt: ISODate("2025-01-01") } })

Aggregation Pipeline

The aggregation pipeline is MongoDB's answer to SQL's GROUP BY, HAVING, window functions, and subqueries — but expressed as a sequence of stages that documents flow through, each transforming the result set.

// Revenue by product category for Q1 2026, with average order value
db.orders.aggregate([
  // Stage 1: Filter to Q1 2026 shipped/delivered orders
  { $match: {
      created_at: {
        $gte: ISODate("2026-01-01"),
        $lt: ISODate("2026-04-01")
      },
      status: { $in: ["shipped", "delivered"] }
  }},

  // Stage 2: Unwind the items array (1 doc per item)
  { $unwind: "$items" },

  // Stage 3: Group by category
  { $group: {
      _id: "$items.category",
      total_revenue: { $sum: { $multiply: ["$items.quantity", "$items.unit_price_cents"] } },
      total_units: { $sum: "$items.quantity" },
      order_count: { $addToSet: "$_id" },  // Unique orders
      avg_item_price: { $avg: "$items.unit_price_cents" }
  }},

  // Stage 4: Add computed fields
  { $addFields: {
      unique_orders: { $size: "$order_count" },
      total_revenue_dollars: { $divide: ["$total_revenue", 100] }
  }},

  // Stage 5: Remove the large order_count array
  { $project: { order_count: 0 } },

  // Stage 6: Sort by revenue descending
  { $sort: { total_revenue: -1 } },

  // Stage 7: Limit to top 10
  { $limit: 10 }
])

// Result:
// { _id: "electronics",  total_revenue: 4589200, total_units: 312, ... }
// { _id: "accessories",  total_revenue: 892100,  total_units: 1205, ... }
// { _id: "clothing",     total_revenue: 445600,  total_units: 892, ... }

Key Aggregation Stages

StageSQL EquivalentWhat It Does
$matchWHERE / HAVINGFilters documents. Place early to reduce pipeline volume.
$groupGROUP BYGroups docs by key and applies accumulators ($sum, $avg, $max, etc.)
$projectSELECTInclude, exclude, or reshape fields.
$unwindLATERAL FLATTENDeconstructs an array field into one doc per element.
$lookupLEFT OUTER JOINJoins another collection. Performs a correlated subquery.
$sortORDER BYSorts results. Uses indexes if placed at the start.
$facetMultiple GROUP BYsRun multiple pipelines in parallel on the same input.
$bucketCASE WHEN groupingGroups into custom ranges (price bands, age ranges).
$graphLookupRecursive CTERecursive self-lookup (org charts, category trees).
$mergeINSERT INTO ... SELECTWrite pipeline results to another collection.
Performance tip: Always place $match and $sort stages as early as possible. If $match is the first stage and uses an indexed field, MongoDB uses the index to avoid a full collection scan. A pipeline that starts with $unwind on a million-document collection is a guaranteed performance disaster.

$lookup — Server-Side Joins

// Enrich orders with full product details from a separate "products" collection
db.orders.aggregate([
  { $match: { "customer.id": 42 } },
  { $unwind: "$items" },
  { $lookup: {
      from: "products",
      localField: "items.sku",
      foreignField: "sku",
      as: "product_details"
  }},
  { $unwind: "$product_details" },
  { $project: {
      order_number: 1,
      "items.sku": 1,
      "items.quantity": 1,
      "product_details.name": 1,
      "product_details.price_cents": 1,
      "product_details.specs": 1
  }}
])

Change Streams

Change streams let you subscribe to real-time data changes on a collection, database, or entire deployment. They're built on top of the oplog and provide a resumable, ordered stream of change events.

// Watch for new orders and status changes
const pipeline = [
  { $match: {
      $or: [
        { operationType: "insert" },
        { operationType: "update", "updateDescription.updatedFields.status": { $exists: true } }
      ]
  }}
];

const changeStream = db.orders.watch(pipeline, {
  fullDocument: "updateLookup"  // Include the full document on updates
});

changeStream.on("change", (event) => {
  console.log(`${event.operationType}: Order ${event.fullDocument.order_number}`);
  console.log(`Status: ${event.fullDocument.status}`);

  // Trigger downstream actions
  if (event.fullDocument.status === "shipped") {
    sendShippingNotification(event.fullDocument);
  }
});

// Resume after disconnect using a resume token
const resumeToken = changeStream.resumeToken;
// Later:
const resumed = db.orders.watch(pipeline, { resumeAfter: resumeToken });
Use cases for change streams: Real-time dashboards, event-driven microservices (order → payment → shipping), cache invalidation, audit logging, synchronizing data to Elasticsearch or a data warehouse, and CDC (Change Data Capture) pipelines.

Indexes

Without indexes, every query performs a collection scan (COLLSCAN) — examining every document. With millions of documents, that's unacceptable. MongoDB supports a rich set of index types:

Index TypeSyntaxUse Case
Single field{ email: 1 }Equality and range queries on one field
Compound{ status: 1, created_at: -1 }Multi-field queries. Order matters (ESR rule).
Multikey{ tags: 1 }Indexes each element in an array field
Text{ name: "text", description: "text" }Full-text search with stemming and scoring
Geospatial (2dsphere){ location: "2dsphere" }$near, $geoWithin queries on GeoJSON
Hashed{ user_id: "hashed" }Equality only. Used as shard key for even distribution.
TTL{ expires_at: 1 }, { expireAfterSeconds: 0 }Auto-delete expired documents (sessions, temp data)
Wildcard{ "specs.$**": 1 }Index all sub-fields of a polymorphic field
Partial{ status: 1 }, { partialFilterExpression: ... }Index only a subset of documents (smaller index)

The ESR Rule (Equality → Sort → Range)

For compound indexes, order fields by: Equality fields first, then Sort fields, then Range fields. This gives the query planner the best chance to satisfy the query with an index scan and an in-order traversal.

// Query pattern: find active orders for a customer, sorted by date
db.orders.find({
  "customer.id": 42,           // Equality
  status: "active",            // Equality
  created_at: { $gte: ISODate("2026-01-01") }  // Range
}).sort({ created_at: -1 })    // Sort

// Optimal compound index (ESR):
db.orders.createIndex({
  "customer.id": 1,   // E — equality
  status: 1,          // E — equality
  created_at: -1      // S+R — sort direction matches, also serves range
})

// Verify with explain():
db.orders.find({ ... }).explain("executionStats")
// Look for:
//   "stage": "IXSCAN"          ← Good (not COLLSCAN)
//   "nReturned": 47            ← Documents returned
//   "totalKeysExamined": 47    ← Keys examined (ideal: same as nReturned)
//   "totalDocsExamined": 47    ← Docs examined (covered query: 0)

Text Indexes

// Create a text index for product search
db.products.createIndex(
  { name: "text", description: "text", tags: "text" },
  { weights: { name: 10, tags: 5, description: 1 },  // Name matches score 10x
    default_language: "english" }
)

// Search
db.products.find(
  { $text: { $search: "mechanical keyboard wireless" } },
  { score: { $meta: "textScore" } }
).sort({ score: { $meta: "textScore" } }).limit(20)

Geospatial Indexes

// Store locations as GeoJSON
db.restaurants.insertOne({
  name: "MongoDB Café",
  location: {
    type: "Point",
    coordinates: [-122.4194, 37.7749]  // [longitude, latitude]
  },
  cuisine: "american"
})

// Create 2dsphere index
db.restaurants.createIndex({ location: "2dsphere" })

// Find restaurants within 2km of a point
db.restaurants.find({
  location: {
    $near: {
      $geometry: { type: "Point", coordinates: [-122.4194, 37.7749] },
      $maxDistance: 2000  // meters
    }
  }
})

// Find restaurants inside a polygon (delivery zone)
db.restaurants.find({
  location: {
    $geoWithin: {
      $geometry: {
        type: "Polygon",
        coordinates: [[
          [-122.43, 37.78], [-122.41, 37.78],
          [-122.41, 37.77], [-122.43, 37.77],
          [-122.43, 37.78]
        ]]
      }
    }
  }
})

Index Sizing

Rule of thumb: Your working set of indexes should fit in RAM. If indexes exceed available memory, every query that needs a non-cached index page causes a disk I/O, destroying performance. Check with db.orders.stats().totalIndexSize — if this exceeds your WiredTiger cache allocation, you need bigger instances or fewer/smarter indexes.
// Check index sizes for a collection
db.orders.stats().indexSizes
// {
//   "_id_":                    2.1 GB,
//   "customer.id_1_status_1":  1.8 GB,
//   "created_at_-1":           0.9 GB,
//   "items.sku_1":             3.2 GB   ← Multikey index on array = large
// }
// Total: 8.0 GB → Need at least 8 GB of WiredTiger cache for index-only

// List indexes and their usage stats
db.orders.aggregate([{ $indexStats: {} }])
// Shows ops count per index — drop indexes with 0 ops since last restart

Transactions in MongoDB 4.0+

Before MongoDB 4.0, atomicity was limited to single-document operations — which is often sufficient since documents can embed related data. But some workflows genuinely require multi-document, multi-collection ACID transactions. MongoDB 4.0 added support for multi-document transactions on replica sets, and 4.2 extended them to sharded clusters.

// Transfer credits between two user accounts (multi-document transaction)
const session = db.getMongo().startSession();
session.startTransaction({
  readConcern: { level: "snapshot" },
  writeConcern: { w: "majority" },
  readPreference: "primary"
});

try {
  const accounts = session.getDatabase("bank").accounts;

  // Debit sender
  const sender = accounts.findOneAndUpdate(
    { _id: "alice", balance: { $gte: 5000 } },
    { $inc: { balance: -5000 }, $push: { transactions: {
        type: "debit", amount: 5000, to: "bob", ts: new Date()
    }}},
    { session, returnDocument: "after" }
  );

  if (!sender) throw new Error("Insufficient funds");

  // Credit receiver
  accounts.updateOne(
    { _id: "bob" },
    { $inc: { balance: 5000 }, $push: { transactions: {
        type: "credit", amount: 5000, from: "alice", ts: new Date()
    }}},
    { session }
  );

  // Insert audit log (different collection)
  session.getDatabase("bank").audit_log.insertOne({
    action: "transfer",
    from: "alice", to: "bob", amount: 5000,
    timestamp: new Date()
  }, { session });

  session.commitTransaction();
  print("Transfer committed successfully");

} catch (error) {
  session.abortTransaction();
  print("Transfer aborted: " + error.message);

} finally {
  session.endSession();
}
Transaction performance caveats:
  • 60-second lifetime limit (default transactionLifetimeLimitSeconds). Long-running transactions are killed.
  • 1,000 documents modified per transaction is a practical limit before performance degrades.
  • Write conflicts: If two transactions modify the same document, one gets a WriteConflict error and must retry.
  • Oplog entries: Each transaction writes a single oplog entry — large transactions create large oplog entries, slowing replication.
  • Not a replacement for good schema design. If you're using transactions for every operation, you've likely modeled your data wrong. Embedding related data in a single document eliminates most transaction needs.

Retryable Writes & Reads

// MongoDB drivers retry transient errors automatically (since 4.2+)
const client = new MongoClient(uri, {
  retryWrites: true,   // Auto-retry on network errors, elections
  retryReads: true,     // Auto-retry reads
  w: "majority",
  readPreference: "primaryPreferred"
});

// For transactions, implement your own retry loop:
async function runTransactionWithRetry(session, txnFunc) {
  while (true) {
    try {
      await txnFunc(session);
      break;
    } catch (error) {
      if (error.hasErrorLabel("TransientTransactionError")) {
        console.log("Transient error, retrying...");
        continue;
      }
      throw error;
    }
  }
}

Real-World Sizing

Let's size a MongoDB deployment for a mid-scale e-commerce platform: 10 million orders, 100,000 products, 2 million users, with 5,000 writes/sec peak and 20,000 reads/sec peak.

Document Size Estimation

CollectionAvg Doc SizeDoc CountRaw DataWith zstd (~3x)Indexes
orders2.5 KB10M25 GB~8.3 GB~6 GB (4 indexes)
products4 KB100K0.4 GB~0.13 GB~0.2 GB
users1.5 KB2M3 GB~1 GB~1.2 GB
events0.5 KB500M250 GB~83 GB~40 GB (TTL + compound)
Total278.4 GB~92.4 GB~47.4 GB

Deployment Recommendation

3-Shard
Cluster (each shard = 3-member RS)
64 GB
RAM per shard (32 GB WiredTiger cache)
500 GB
NVMe SSD per shard
3×mongos
Behind ALB for HA
// Working set must fit in WiredTiger cache:
// Hot data: recent 30 days of orders (~2.5M docs × 2.5 KB = 6.25 GB)
//         + all products (0.4 GB)
//         + active users (0.5 GB)
//         + hot indexes (~15 GB)
// Total working set ≈ 22 GB → Fits in 32 GB cache ✓

// Throughput: 5,000 writes/sec ÷ 3 shards = ~1,667 writes/shard
// Single replica set can handle ~10,000-20,000 writes/sec on NVMe
// Headroom: 83-90% → Healthy ✓
Connection pooling: Each mongos maintains a connection pool to every shard. With 3 app servers × 100 connections/server × 3 mongos instances, you get 900 connections. Each shard handles 300. MongoDB default max is 65,536 connections per mongod, but practical limits are lower (each connection ≈ 1 MB memory). Use connection pooling libraries with maxPoolSize: 100.

When to Use Document Stores

✅ Great Fit

  • Product catalogs — varying attributes per category
  • Content management — articles with embedded media, tags, comments
  • User profiles — flexible preferences, nested settings
  • Event sourcing — append-heavy, polymorphic events
  • Real-time analytics — aggregation pipeline + change streams
  • Mobile/IoT backends — MongoDB Realm sync, flexible schemas for evolving apps
  • Rapid prototyping — no schema migrations, fast iteration
  • Multi-tenant SaaS — shard by tenant_id, tenant-specific schemas

❌ Poor Fit

  • Financial ledgers — need strict ACID across many entities
  • Heavy relational joins — 5+ table JOINs are SQL territory
  • Highly normalized data — if you need 3NF, use a relational DB
  • Complex graph traversals — use Neo4j or Neptune
  • Write-heavy time-series — Cassandra or TimescaleDB may be better
  • Sub-millisecond caching — Redis is purpose-built
  • Full OLAP workloads — use ClickHouse, BigQuery, or Snowflake
  • Strong schema enforcement — SQL's DDL is still more rigorous

MongoDB vs. Other Document Stores

DatabaseStrengthsWeaknessesBest For
MongoDBRich queries, aggregation, transactions, ecosystemMemory-hungry, SSPL licenseGeneral-purpose document workloads
CouchDBMaster-master replication, HTTP API, offline-firstSlower queries, limited indexingOffline-capable mobile/edge apps
FirestoreServerless, real-time sync, Firebase integrationLimited queries, vendor lock-inMobile apps, small-medium scale
Amazon DocumentDBMongoDB-compatible, managed, integrated with AWSNot 100% compatible, slower feature adoptionAWS-native MongoDB workloads
CosmosDBMulti-model, global distribution, SLA-backedExpensive at scale, RU pricing complexityAzure-native, multi-region apps
Final decision framework: Choose MongoDB when (1) your data is naturally hierarchical or polymorphic, (2) you need horizontal scaling with rich queries, (3) your access patterns are document-centric (fetch/update one entity at a time), and (4) you can tolerate some denormalization. For everything else, pick the more specialized tool — SQL for complex joins, Redis for caching, Cassandra for massive write throughput, or a graph DB for relationship-heavy data.