Introduction to System Design

April 2026 · 20 min read

System design interviews are the single biggest differentiator between junior and senior engineering candidates. While coding interviews test your ability to write correct algorithms, system design interviews test something far broader: can you take a vague, open-ended problem and architect a production-grade solution that serves millions of users?

This post is the opening chapter of a 70-part High Level Design series. We'll start from absolute fundamentals—what system design even is, the framework interviewers expect, and the math behind estimation—and build up to designing systems like YouTube, Uber, and Twitter from scratch. Whether you're preparing for FAANG interviews or simply want to become a better engineer, this series will give you a complete, structured foundation.

💡 Series Roadmap: This is Post 1 of 70. We'll cover foundations (scaling, databases, caching), building blocks (load balancers, message queues, CDNs), architecture patterns (microservices, event-driven), and finally real-world case studies (designing URL shorteners, chat systems, video streaming platforms, and more).

What is System Design?

System design is the process of defining the architecture, components, modules, interfaces, and data flows of a system to satisfy a set of specified requirements. In a professional context, it means answering: "Given these business needs, how do we build a system that works at scale, stays reliable, and can evolve over time?"

Why Tech Companies Ask System Design Questions

Coding interviews test a narrow skill: can you translate a well-defined problem into correct code? System design interviews test the skills you actually use every day as a senior engineer:

Ambiguity resolution — Real products start with vague requirements. Can you ask the right clarifying questions?
Technical breadth — Do you know when to use SQL vs NoSQL, a message queue vs a direct API call, a CDN vs an origin server?
Trade-off analysis — Every design decision has a cost. Can you articulate why you chose consistency over availability, or horizontal scaling over vertical?
Communication — Can you explain complex architectures clearly, draw diagrams on a whiteboard, and respond to feedback?
Scale intuition — Do you understand the difference between serving 100 users and 100 million? Can you estimate whether a single machine suffices or you need a distributed cluster?

Junior vs Senior Expectations

Aspect	Junior (0–3 yrs)	Senior (5+ yrs)
Scope	Focused on one service or component	End-to-end system with multiple services
Requirements	May need heavy guidance from interviewer	Independently gathers and prioritizes requirements
Estimation	Rough order-of-magnitude estimates	Detailed QPS, storage, bandwidth calculations
Trade-offs	Identifies one or two trade-offs	Discusses multiple alternatives with clear reasoning
Deep Dive	Basic knowledge of one area	Can deep-dive into any component (DB schema, API design, caching strategy)
Non-functional	Mentions availability and scalability	Addresses consistency models, fault tolerance, monitoring, security

System Design vs Coding Interviews

These are fundamentally different skill sets:

	Coding Interview	System Design Interview
Input	Well-defined problem statement	Vague, open-ended prompt
Output	Working code that passes test cases	Architecture diagram + verbal explanation
Skills	Algorithms, data structures, coding speed	Architecture, trade-offs, breadth of knowledge
Evaluation	Correctness + time/space complexity	Completeness, scalability, communication, trade-off reasoning
Preparation	LeetCode, HackerRank, competitive programming	Reading engineering blogs, designing on whiteboards, understanding infra

HLD vs LLD

System design interviews come in two flavors: High Level Design (HLD) and Low Level Design (LLD). Understanding the distinction is critical because they require fundamentally different preparation strategies.

Dimension	High Level Design (HLD)	Low Level Design (LLD)
Abstraction	30,000-foot view: services, data stores, APIs	Code-level: classes, interfaces, methods, relationships
Output	Architecture diagrams, data flow diagrams	UML class diagrams, code structure
Components	Load balancers, databases, caches, queues, CDNs	Design patterns, SOLID principles, inheritance hierarchies
Example Question	"Design Twitter" / "Design a URL shortener"	"Design a parking lot system" / "Design a chess game"
Scale	Millions/billions of users, distributed systems	Single-process or single-service complexity
Key Skills	Distributed systems, databases, networking, estimation	OOP, design patterns, clean code, SOLID
Interview Level	Senior / Staff / Principal	Junior / Mid / Senior
Time	35–60 minutes	30–45 minutes

What HLD Covers

In a high-level design round, you're expected to discuss:

Services — Which microservices or monolith modules do you need? How do they communicate (REST, gRPC, async messaging)?
Databases — SQL vs NoSQL? What schema? How do you partition data? Read replicas?
APIs — What endpoints does your system expose? What does the request/response look like?
Infrastructure — Load balancers, CDNs, caches (Redis/Memcached), message queues (Kafka/RabbitMQ), object storage (S3).
Data Flow — How does a request travel from the user's browser through your system and back?

What LLD Covers

In a low-level design round, you're expected to discuss:

Classes & Interfaces — What objects exist in the system? What are their responsibilities?
Design Patterns — Strategy, Observer, Factory, Singleton, Command, etc.
SOLID Principles — Single Responsibility, Open/Closed, Liskov Substitution, Interface Segregation, Dependency Inversion.
Relationships — Inheritance, composition, aggregation. When to use each.
Extensibility — How easy is it to add new features without modifying existing code?

💡 Key Insight: This entire series focuses on HLD. If you're looking for LLD content (design patterns, OOP, SOLID), check out the separate Low Level Design series. Many companies ask both in the same interview loop, so you'll want to prepare for both.

The 4-Step Framework

Every system design interview should follow a structured approach. Interviewers expect this structure, and deviating from it is one of the most common reasons candidates fail. The framework below works for virtually every HLD question, from "Design a URL Shortener" to "Design YouTube."

Step 1: Understand Requirements & Scope (5–8 minutes)

Before drawing a single box, you must clarify what you're building. This is not optional—it's the most important step. Ask questions to nail down:

Functional requirements — What features does the system support? (e.g., "Users can create short URLs and be redirected to the original URL")
Non-functional requirements — What are the quality attributes? (e.g., "99.99% availability, <100ms redirect latency, 100M URLs created per day")
Scope boundaries — What is explicitly out of scope? (e.g., "We won't handle analytics for now")
User base — Who are the users? How many? What's the read-to-write ratio?

Write the requirements down visibly. This anchors your design and gives the interviewer confidence you're methodical.

Step 2: Back-of-Envelope Estimation (3–5 minutes)

Use rough math to quantify the scale of your system. This step informs critical design decisions:

Traffic estimation — Queries per second (QPS) for reads and writes
Storage estimation — How much data over 5 years?
Bandwidth estimation — Incoming and outgoing data per second
Memory estimation — Can we cache the hot data? How much RAM?

Don't aim for precision—aim for the right order of magnitude. If you estimate 10K QPS and the real number is 15K, it doesn't matter. If you estimate 10K and it's actually 10M, your architecture will be fundamentally different.

Step 3: High-Level Design (10–15 minutes)

Now draw the architecture. Start with the core flow and add components as needed:

API design — Define the key endpoints (REST or gRPC)
Service architecture — Draw boxes for each service/component
Data model — Define the database schema and choose storage type
Data flow — Trace a request from client to server and back with arrows

Start simple. A client, a web server, and a database is a perfectly valid first draft. Then ask yourself: "What breaks when this gets 1000x traffic?" and add load balancers, caches, queues, etc.

Step 4: Deep Dive & Trade-offs (10–15 minutes)

The interviewer will pick 1–2 areas to probe. Be ready to go deep on any component:

Bottleneck identification — Where does the system fail under load?
Scaling strategies — Horizontal vs vertical scaling, sharding, replication
Caching — What to cache, eviction policies, cache invalidation
Consistency vs availability — CAP theorem trade-offs
Failure handling — What happens when a service goes down? Retry logic, circuit breakers, graceful degradation
Monitoring & alerting — How do you know when something's wrong?

Always frame your answers as trade-offs: "We could use X, which gives us A but costs us B. Alternatively, Y gives us C but costs us D. Given our requirements, I'd choose X because..."

Functional vs Non-Functional Requirements

Every system has two kinds of requirements. Confusing them or forgetting one category is a classic interview mistake.

Functional Requirements (FRs)

Functional requirements describe what the system does—the features and behaviors visible to users.

User can create an account and log in
User can upload a photo
User can search for other users
System generates a short URL from a long URL
System redirects short URLs to the original long URL

Non-Functional Requirements (NFRs)

Non-functional requirements describe how well the system performs—quality attributes that aren't features themselves but constrain the design.

NFR	Definition	Typical Target	Design Implication
Scalability	System handles increasing load	10x traffic in 2 years	Horizontal scaling, sharding, load balancing
Availability	System is operational when needed	99.99% uptime (52 min downtime/year)	Redundancy, failover, multi-region deployment
Consistency	All users see the same data at the same time	Strong or eventual (depends on use case)	Consensus protocols, replication strategy
Latency	Time to process a single request	p99 < 200ms	Caching, CDN, efficient queries, proximity
Durability	Data is not lost once written	99.999999999% (11 nines)	Replication, backups, write-ahead logs
Throughput	Requests processed per unit time	50K QPS	Async processing, partitioning, batching
Security	Protection against unauthorized access	Varies by domain	Encryption, auth, rate limiting, input validation

Example: URL Shortener Requirements

Let's walk through a concrete example. Suppose the interviewer says: "Design a URL shortening service like bit.ly."

💡 Functional Requirements:

Given a long URL, generate a unique short URL
When a user visits the short URL, redirect to the original long URL
Short URLs should expire after a configurable TTL (optional)
Users can choose a custom alias (optional)

💡 Non-Functional Requirements:

Availability: 99.99% — Users should always be able to resolve short URLs
Latency: Redirect in <50ms (read-heavy, latency-critical)
Scalability: 100M new URLs/day, 10:1 read-to-write ratio = 1B redirects/day
Durability: Once created, a URL mapping must never be lost
Consistency: Eventual consistency is acceptable (a few seconds delay after creation is fine)

Notice how these requirements immediately drive design decisions. High availability + low latency + read-heavy workload = caching is essential. Eventual consistency = we can use async replication. 100M writes/day ≈ 1,160 QPS writes—a single beefy database can handle this, but we'll want replication for availability.

Back-of-Envelope Estimation

Back-of-envelope estimation is a skill that separates strong candidates from weak ones. You don't need exact numbers—you need the ability to quickly compute whether your system needs 1 server or 1,000, 10 GB or 10 TB. The key tools are powers of 2, standard latency numbers, and a systematic calculation approach.

Powers of 2: Data Units

Power	Exact Value	Approx.	Name	Short
2¹⁰	1,024	~1 Thousand	Kilobyte	1 KB
2²⁰	1,048,576	~1 Million	Megabyte	1 MB
2³⁰	1,073,741,824	~1 Billion	Gigabyte	1 GB
2⁴⁰	1,099,511,627,776	~1 Trillion	Terabyte	1 TB
2⁵⁰	1,125,899,906,842,624	~1 Quadrillion	Petabyte	1 PB

💡 Quick Conversions: 1 char = 1 byte (ASCII) or 2–4 bytes (UTF-8 extended). A typical URL ≈ 100 bytes. A typical JSON API response ≈ 1–10 KB. A photo ≈ 200 KB–5 MB. A 1-minute video ≈ 5–50 MB depending on resolution.

Latency Numbers Every Programmer Should Know

These are approximate numbers that help you reason about where time is spent in your system. They're based on Jeff Dean's famous list, updated for modern hardware (circa 2024):

Operation	Latency	Notes
L1 cache reference	0.5 ns	Fastest memory access possible
Branch mispredict	5 ns	CPU pipeline flush penalty
L2 cache reference	7 ns	~14x L1 cache
Mutex lock/unlock	25 ns	Contention can increase this dramatically
Main memory (RAM) reference	100 ns	~200x L1 cache
Compress 1 KB with Zippy	3,000 ns (3 µs)	Snappy/LZ4 compression
Send 1 KB over 1 Gbps network	10,000 ns (10 µs)	Within data center
Read 4 KB randomly from SSD	150,000 ns (150 µs)	~1,000x memory access
Read 1 MB sequentially from memory	250,000 ns (250 µs)	Fast sequential access
Round trip within same datacenter	500,000 ns (0.5 ms)	Network hop + processing
Read 1 MB sequentially from SSD	1,000,000 ns (1 ms)	~4x sequential memory read
HDD seek	10,000,000 ns (10 ms)	Moving disk head is slow
Read 1 MB sequentially from HDD	20,000,000 ns (20 ms)	~20x SSD sequential
Send packet CA → NL → CA	150,000,000 ns (150 ms)	Speed of light is the bottleneck

The key takeaways from these numbers:

Memory is ~1000x faster than SSD. This is why caching matters so much.
SSD is ~100x faster than HDD. Use SSDs for databases whenever possible.
Network within a datacenter (0.5ms) is ~300x faster than cross-continent (150ms). This is why CDNs and multi-region deployments matter.
Compression is cheap (3µs for 1KB). Almost always compress data before sending it over the network.

Traffic Estimation Walkthrough

Let's estimate the traffic for our URL shortener:

// Given
New URLs per day         = 100M
Read:Write ratio         = 10:1

// Write QPS (Queries Per Second)
Write QPS = 100M / (24 × 3600)
         = 100,000,000 / 86,400
         ≈ ~1,160 writes/sec

// Read QPS
Read QPS = 1,160 × 10
         = ~11,600 reads/sec

// Peak QPS (assume 2x average)
Peak Write QPS = ~2,300 writes/sec
Peak Read QPS = ~23,200 reads/sec

Storage Estimation Walkthrough

// Each URL record
Short URL (7 chars)    = 7 bytes
Long URL (avg)        = 200 bytes
Created timestamp     = 8 bytes
Expiry timestamp      = 8 bytes
User ID              = 8 bytes
Total per record      ≈ ~250 bytes

// Storage for 5 years
URLs per year   = 100M × 365 = 36.5B
URLs in 5 years = 36.5B × 5 = 182.5B

Storage = 182.5B × 250 bytes
       = 45.6 × 10¹² bytes
       = ~45.6 TB

Bandwidth Estimation Walkthrough

// Incoming (write) bandwidth
Write bandwidth = 1,160 writes/sec × 250 bytes
              = 290,000 bytes/sec
              = ~290 KB/s (trivial)

// Outgoing (read) bandwidth
Read bandwidth = 11,600 reads/sec × 250 bytes
             = 2,900,000 bytes/sec
             = ~2.9 MB/s (still modest)

QPS Calculation: A General Formula

💡 Formula: QPS = (Daily Active Users × Avg. Requests per User) / 86,400
Peak QPS ≈ 2 × QPS (for most applications)
Peak QPS ≈ 3–5 × QPS (for bursty workloads like flash sales)

The number 86,400 (seconds in a day) is one you should memorize. A useful shortcut: 100K daily ≈ 1 QPS. So 100M daily events = ~1,000 QPS. This gives you a quick sanity check for any estimation.

Key Concepts Preview

This series will cover dozens of topics in depth. Here's a preview of the core concepts you'll master, with brief definitions so you have a mental map before we dive in:

Vertical Scaling Adding more CPU, RAM, or disk to a single machine. Simple but has hard limits.

Horizontal Scaling Adding more machines to a pool. Complex but virtually unlimited.

Load Balancing Distributing incoming requests across multiple servers to prevent any single server from being overwhelmed.

Caching Storing frequently accessed data in a fast layer (RAM) to avoid repeated expensive computations or database lookups.

Database Sharding Splitting a large database into smaller, faster, more manageable pieces called shards, each holding a subset of data.

Replication Keeping copies of data on multiple machines for fault tolerance and read performance.

CAP Theorem In a distributed system, you can guarantee at most 2 of 3: Consistency, Availability, Partition Tolerance.

Consistent Hashing A technique for distributing data across nodes that minimizes re-distribution when nodes are added/removed.

Message Queues Asynchronous communication between services via a broker (Kafka, RabbitMQ, SQS) for decoupling and reliability.

CDN Content Delivery Network—edge servers that cache static content close to users for low latency.

API Gateway Single entry point for all client requests that handles auth, rate limiting, routing, and load balancing.

Database Indexing Data structures (B-trees, hash indexes) that speed up queries at the cost of extra storage and slower writes.

Each of these topics gets its own dedicated post (or multiple posts) in this series. By the end, you'll understand not just what each concept is, but when and why to use it in a design.

Common Mistakes in System Design Interviews

Having conducted and observed hundreds of system design interviews, here are the patterns that consistently trip candidates up—and how to avoid each one.

❌ Mistake 1: Jumping Straight Into Design

The candidate immediately starts drawing boxes and arrows without understanding what they're building. They design a chat system when the interviewer wanted a notification system.

✔ Fix: Spend the first 5–8 minutes only asking clarifying questions and writing down requirements. Confirm with the interviewer: "So we're building X with these features, at this scale. Is that right?"

❌ Mistake 2: Skipping Estimation

The candidate says "we'll have a lot of users" without quantifying what "a lot" means. This leads to over- or under-engineering. You can't decide whether to use a single database or a sharded cluster without knowing the QPS.

✔ Fix: Always do quick back-of-envelope math. Even saying "100M users, 10 requests/day each = ~12K QPS, so we'll need multiple servers behind a load balancer" shows strong signal.

❌ Mistake 3: Over-Engineering from the Start

The candidate adds Kafka, Redis, Elasticsearch, a service mesh, Kubernetes, and a ML pipeline before they've even drawn the basic happy path. This signals that they're pattern-matching from memorized designs rather than thinking from first principles.

✔ Fix: Start with the simplest architecture that works: client → server → database. Then identify bottlenecks and add components with clear justification: "Our read QPS is 50K, so we need a cache. I'd use Redis because..."

❌ Mistake 4: Ignoring Trade-offs

The candidate presents their design as the "right answer" without acknowledging alternatives. System design is fundamentally about trade-offs—there is no single correct answer.

✔ Fix: For every major decision, briefly state the alternative and why you chose differently: "We could use a NoSQL database for flexibility, but I'm choosing SQL here because we need ACID transactions for payment processing."

❌ Mistake 5: Not Communicating Your Thought Process

The candidate draws silently for 10 minutes, then presents a completed diagram. The interviewer can't evaluate how you think, only the end result—and they're judging the process more than the output.

✔ Fix: Think out loud continuously. Narrate your reasoning: "I'm thinking about the data model now. We need fast lookups by short URL, which suggests a hash-based key. Let me consider SQL vs NoSQL..."

❌ Mistake 6: Going Too Deep Too Early

The candidate spends 20 minutes on database schema design and never gets to the high-level architecture. The interviewer wanted to see the big picture first.

✔ Fix: Follow the framework. Broad strokes first (Step 3), then offer deep dives: "I've outlined the high-level architecture. Would you like me to deep dive into the database design, caching strategy, or API design?"

💡 The Meta-Skill: System design interviews are as much about communication and structure as they are about technical knowledge. A candidate with moderate knowledge who communicates well will often outperform a deeply technical candidate who can't explain their thinking.

Interactive: The 4-Step Design Process

Click Step to walk through each phase of the framework. Watch how each step builds on the previous one and generates inputs for the next.

▶ System Design Interview Framework

Interactive: Latency Numbers Visualized

Click Step to reveal each latency tier. The bars use a logarithmic scale—notice how each operation is often 10–100x slower than the previous one. This is why choosing the right storage and communication layer matters so much in system design.

▶ Latency Numbers Every Programmer Should Know

What’s Next

Now that you understand the fundamentals—the interview framework, the types of requirements, and how to estimate scale—you're ready to dive into the building blocks. In the next post, we'll explore Scaling: vertical vs horizontal scaling, stateless vs stateful architectures, and how to reason about when you need to scale.

💡 Action Items:

Practice the 4-step framework on 2–3 design problems this week
Memorize the latency numbers table (or at least the order of magnitude)
Do a back-of-envelope estimation for a system you use daily (Instagram, Gmail, Spotify)
Practice explaining your designs out loud—even to yourself in a mirror