Introduction to System Design
System design interviews are the single biggest differentiator between junior and senior engineering candidates. While coding interviews test your ability to write correct algorithms, system design interviews test something far broader: can you take a vague, open-ended problem and architect a production-grade solution that serves millions of users?
This post is the opening chapter of a 70-part High Level Design series. We'll start from absolute fundamentals—what system design even is, the framework interviewers expect, and the math behind estimation—and build up to designing systems like YouTube, Uber, and Twitter from scratch. Whether you're preparing for FAANG interviews or simply want to become a better engineer, this series will give you a complete, structured foundation.
What is System Design?
System design is the process of defining the architecture, components, modules, interfaces, and data flows of a system to satisfy a set of specified requirements. In a professional context, it means answering: "Given these business needs, how do we build a system that works at scale, stays reliable, and can evolve over time?"
Why Tech Companies Ask System Design Questions
Coding interviews test a narrow skill: can you translate a well-defined problem into correct code? System design interviews test the skills you actually use every day as a senior engineer:
- Ambiguity resolution — Real products start with vague requirements. Can you ask the right clarifying questions?
- Technical breadth — Do you know when to use SQL vs NoSQL, a message queue vs a direct API call, a CDN vs an origin server?
- Trade-off analysis — Every design decision has a cost. Can you articulate why you chose consistency over availability, or horizontal scaling over vertical?
- Communication — Can you explain complex architectures clearly, draw diagrams on a whiteboard, and respond to feedback?
- Scale intuition — Do you understand the difference between serving 100 users and 100 million? Can you estimate whether a single machine suffices or you need a distributed cluster?
Junior vs Senior Expectations
| Aspect | Junior (0–3 yrs) | Senior (5+ yrs) |
|---|---|---|
| Scope | Focused on one service or component | End-to-end system with multiple services |
| Requirements | May need heavy guidance from interviewer | Independently gathers and prioritizes requirements |
| Estimation | Rough order-of-magnitude estimates | Detailed QPS, storage, bandwidth calculations |
| Trade-offs | Identifies one or two trade-offs | Discusses multiple alternatives with clear reasoning |
| Deep Dive | Basic knowledge of one area | Can deep-dive into any component (DB schema, API design, caching strategy) |
| Non-functional | Mentions availability and scalability | Addresses consistency models, fault tolerance, monitoring, security |
System Design vs Coding Interviews
These are fundamentally different skill sets:
| Coding Interview | System Design Interview | |
|---|---|---|
| Input | Well-defined problem statement | Vague, open-ended prompt |
| Output | Working code that passes test cases | Architecture diagram + verbal explanation |
| Skills | Algorithms, data structures, coding speed | Architecture, trade-offs, breadth of knowledge |
| Evaluation | Correctness + time/space complexity | Completeness, scalability, communication, trade-off reasoning |
| Preparation | LeetCode, HackerRank, competitive programming | Reading engineering blogs, designing on whiteboards, understanding infra |
HLD vs LLD
System design interviews come in two flavors: High Level Design (HLD) and Low Level Design (LLD). Understanding the distinction is critical because they require fundamentally different preparation strategies.
| Dimension | High Level Design (HLD) | Low Level Design (LLD) |
|---|---|---|
| Abstraction | 30,000-foot view: services, data stores, APIs | Code-level: classes, interfaces, methods, relationships |
| Output | Architecture diagrams, data flow diagrams | UML class diagrams, code structure |
| Components | Load balancers, databases, caches, queues, CDNs | Design patterns, SOLID principles, inheritance hierarchies |
| Example Question | "Design Twitter" / "Design a URL shortener" | "Design a parking lot system" / "Design a chess game" |
| Scale | Millions/billions of users, distributed systems | Single-process or single-service complexity |
| Key Skills | Distributed systems, databases, networking, estimation | OOP, design patterns, clean code, SOLID |
| Interview Level | Senior / Staff / Principal | Junior / Mid / Senior |
| Time | 35–60 minutes | 30–45 minutes |
What HLD Covers
In a high-level design round, you're expected to discuss:
- Services — Which microservices or monolith modules do you need? How do they communicate (REST, gRPC, async messaging)?
- Databases — SQL vs NoSQL? What schema? How do you partition data? Read replicas?
- APIs — What endpoints does your system expose? What does the request/response look like?
- Infrastructure — Load balancers, CDNs, caches (Redis/Memcached), message queues (Kafka/RabbitMQ), object storage (S3).
- Data Flow — How does a request travel from the user's browser through your system and back?
What LLD Covers
In a low-level design round, you're expected to discuss:
- Classes & Interfaces — What objects exist in the system? What are their responsibilities?
- Design Patterns — Strategy, Observer, Factory, Singleton, Command, etc.
- SOLID Principles — Single Responsibility, Open/Closed, Liskov Substitution, Interface Segregation, Dependency Inversion.
- Relationships — Inheritance, composition, aggregation. When to use each.
- Extensibility — How easy is it to add new features without modifying existing code?
The 4-Step Framework
Every system design interview should follow a structured approach. Interviewers expect this structure, and deviating from it is one of the most common reasons candidates fail. The framework below works for virtually every HLD question, from "Design a URL Shortener" to "Design YouTube."
Step 1: Understand Requirements & Scope (5–8 minutes)
Before drawing a single box, you must clarify what you're building. This is not optional—it's the most important step. Ask questions to nail down:
- Functional requirements — What features does the system support? (e.g., "Users can create short URLs and be redirected to the original URL")
- Non-functional requirements — What are the quality attributes? (e.g., "99.99% availability, <100ms redirect latency, 100M URLs created per day")
- Scope boundaries — What is explicitly out of scope? (e.g., "We won't handle analytics for now")
- User base — Who are the users? How many? What's the read-to-write ratio?
Write the requirements down visibly. This anchors your design and gives the interviewer confidence you're methodical.
Step 2: Back-of-Envelope Estimation (3–5 minutes)
Use rough math to quantify the scale of your system. This step informs critical design decisions:
- Traffic estimation — Queries per second (QPS) for reads and writes
- Storage estimation — How much data over 5 years?
- Bandwidth estimation — Incoming and outgoing data per second
- Memory estimation — Can we cache the hot data? How much RAM?
Don't aim for precision—aim for the right order of magnitude. If you estimate 10K QPS and the real number is 15K, it doesn't matter. If you estimate 10K and it's actually 10M, your architecture will be fundamentally different.
Step 3: High-Level Design (10–15 minutes)
Now draw the architecture. Start with the core flow and add components as needed:
- API design — Define the key endpoints (REST or gRPC)
- Service architecture — Draw boxes for each service/component
- Data model — Define the database schema and choose storage type
- Data flow — Trace a request from client to server and back with arrows
Start simple. A client, a web server, and a database is a perfectly valid first draft. Then ask yourself: "What breaks when this gets 1000x traffic?" and add load balancers, caches, queues, etc.
Step 4: Deep Dive & Trade-offs (10–15 minutes)
The interviewer will pick 1–2 areas to probe. Be ready to go deep on any component:
- Bottleneck identification — Where does the system fail under load?
- Scaling strategies — Horizontal vs vertical scaling, sharding, replication
- Caching — What to cache, eviction policies, cache invalidation
- Consistency vs availability — CAP theorem trade-offs
- Failure handling — What happens when a service goes down? Retry logic, circuit breakers, graceful degradation
- Monitoring & alerting — How do you know when something's wrong?
Always frame your answers as trade-offs: "We could use X, which gives us A but costs us B. Alternatively, Y gives us C but costs us D. Given our requirements, I'd choose X because..."
Functional vs Non-Functional Requirements
Every system has two kinds of requirements. Confusing them or forgetting one category is a classic interview mistake.
Functional Requirements (FRs)
Functional requirements describe what the system does—the features and behaviors visible to users.
- User can create an account and log in
- User can upload a photo
- User can search for other users
- System generates a short URL from a long URL
- System redirects short URLs to the original long URL
Non-Functional Requirements (NFRs)
Non-functional requirements describe how well the system performs—quality attributes that aren't features themselves but constrain the design.
| NFR | Definition | Typical Target | Design Implication |
|---|---|---|---|
| Scalability | System handles increasing load | 10x traffic in 2 years | Horizontal scaling, sharding, load balancing |
| Availability | System is operational when needed | 99.99% uptime (52 min downtime/year) | Redundancy, failover, multi-region deployment |
| Consistency | All users see the same data at the same time | Strong or eventual (depends on use case) | Consensus protocols, replication strategy |
| Latency | Time to process a single request | p99 < 200ms | Caching, CDN, efficient queries, proximity |
| Durability | Data is not lost once written | 99.999999999% (11 nines) | Replication, backups, write-ahead logs |
| Throughput | Requests processed per unit time | 50K QPS | Async processing, partitioning, batching |
| Security | Protection against unauthorized access | Varies by domain | Encryption, auth, rate limiting, input validation |
Example: URL Shortener Requirements
Let's walk through a concrete example. Suppose the interviewer says: "Design a URL shortening service like bit.ly."
- Given a long URL, generate a unique short URL
- When a user visits the short URL, redirect to the original long URL
- Short URLs should expire after a configurable TTL (optional)
- Users can choose a custom alias (optional)
- Availability: 99.99% — Users should always be able to resolve short URLs
- Latency: Redirect in <50ms (read-heavy, latency-critical)
- Scalability: 100M new URLs/day, 10:1 read-to-write ratio = 1B redirects/day
- Durability: Once created, a URL mapping must never be lost
- Consistency: Eventual consistency is acceptable (a few seconds delay after creation is fine)
Notice how these requirements immediately drive design decisions. High availability + low latency + read-heavy workload = caching is essential. Eventual consistency = we can use async replication. 100M writes/day ≈ 1,160 QPS writes—a single beefy database can handle this, but we'll want replication for availability.
Back-of-Envelope Estimation
Back-of-envelope estimation is a skill that separates strong candidates from weak ones. You don't need exact numbers—you need the ability to quickly compute whether your system needs 1 server or 1,000, 10 GB or 10 TB. The key tools are powers of 2, standard latency numbers, and a systematic calculation approach.
Powers of 2: Data Units
| Power | Exact Value | Approx. | Name | Short |
|---|---|---|---|---|
| 210 | 1,024 | ~1 Thousand | Kilobyte | 1 KB |
| 220 | 1,048,576 | ~1 Million | Megabyte | 1 MB |
| 230 | 1,073,741,824 | ~1 Billion | Gigabyte | 1 GB |
| 240 | 1,099,511,627,776 | ~1 Trillion | Terabyte | 1 TB |
| 250 | 1,125,899,906,842,624 | ~1 Quadrillion | Petabyte | 1 PB |
Latency Numbers Every Programmer Should Know
These are approximate numbers that help you reason about where time is spent in your system. They're based on Jeff Dean's famous list, updated for modern hardware (circa 2024):
| Operation | Latency | Notes |
|---|---|---|
| L1 cache reference | 0.5 ns | Fastest memory access possible |
| Branch mispredict | 5 ns | CPU pipeline flush penalty |
| L2 cache reference | 7 ns | ~14x L1 cache |
| Mutex lock/unlock | 25 ns | Contention can increase this dramatically |
| Main memory (RAM) reference | 100 ns | ~200x L1 cache |
| Compress 1 KB with Zippy | 3,000 ns (3 µs) | Snappy/LZ4 compression |
| Send 1 KB over 1 Gbps network | 10,000 ns (10 µs) | Within data center |
| Read 4 KB randomly from SSD | 150,000 ns (150 µs) | ~1,000x memory access |
| Read 1 MB sequentially from memory | 250,000 ns (250 µs) | Fast sequential access |
| Round trip within same datacenter | 500,000 ns (0.5 ms) | Network hop + processing |
| Read 1 MB sequentially from SSD | 1,000,000 ns (1 ms) | ~4x sequential memory read |
| HDD seek | 10,000,000 ns (10 ms) | Moving disk head is slow |
| Read 1 MB sequentially from HDD | 20,000,000 ns (20 ms) | ~20x SSD sequential |
| Send packet CA → NL → CA | 150,000,000 ns (150 ms) | Speed of light is the bottleneck |
The key takeaways from these numbers:
- Memory is ~1000x faster than SSD. This is why caching matters so much.
- SSD is ~100x faster than HDD. Use SSDs for databases whenever possible.
- Network within a datacenter (0.5ms) is ~300x faster than cross-continent (150ms). This is why CDNs and multi-region deployments matter.
- Compression is cheap (3µs for 1KB). Almost always compress data before sending it over the network.
Traffic Estimation Walkthrough
Let's estimate the traffic for our URL shortener:
New URLs per day = 100M
Read:Write ratio = 10:1
// Write QPS (Queries Per Second)
Write QPS = 100M / (24 × 3600)
= 100,000,000 / 86,400
≈ ~1,160 writes/sec
// Read QPS
Read QPS = 1,160 × 10
= ~11,600 reads/sec
// Peak QPS (assume 2x average)
Peak Write QPS = ~2,300 writes/sec
Peak Read QPS = ~23,200 reads/sec
Storage Estimation Walkthrough
Short URL (7 chars) = 7 bytes
Long URL (avg) = 200 bytes
Created timestamp = 8 bytes
Expiry timestamp = 8 bytes
User ID = 8 bytes
Total per record ≈ ~250 bytes
// Storage for 5 years
URLs per year = 100M × 365 = 36.5B
URLs in 5 years = 36.5B × 5 = 182.5B
Storage = 182.5B × 250 bytes
= 45.6 × 1012 bytes
= ~45.6 TB
Bandwidth Estimation Walkthrough
Write bandwidth = 1,160 writes/sec × 250 bytes
= 290,000 bytes/sec
= ~290 KB/s (trivial)
// Outgoing (read) bandwidth
Read bandwidth = 11,600 reads/sec × 250 bytes
= 2,900,000 bytes/sec
= ~2.9 MB/s (still modest)
QPS Calculation: A General Formula
QPS = (Daily Active Users × Avg. Requests per User) / 86,400Peak QPS ≈ 2 × QPS (for most applications)Peak QPS ≈ 3–5 × QPS (for bursty workloads like flash sales)
The number 86,400 (seconds in a day) is one you should memorize. A useful shortcut: 100K daily ≈ 1 QPS. So 100M daily events = ~1,000 QPS. This gives you a quick sanity check for any estimation.
Key Concepts Preview
This series will cover dozens of topics in depth. Here's a preview of the core concepts you'll master, with brief definitions so you have a mental map before we dive in:
Each of these topics gets its own dedicated post (or multiple posts) in this series. By the end, you'll understand not just what each concept is, but when and why to use it in a design.
Common Mistakes in System Design Interviews
Having conducted and observed hundreds of system design interviews, here are the patterns that consistently trip candidates up—and how to avoid each one.
❌ Mistake 1: Jumping Straight Into Design
The candidate immediately starts drawing boxes and arrows without understanding what they're building. They design a chat system when the interviewer wanted a notification system.
✔ Fix: Spend the first 5–8 minutes only asking clarifying questions and writing down requirements. Confirm with the interviewer: "So we're building X with these features, at this scale. Is that right?"
❌ Mistake 2: Skipping Estimation
The candidate says "we'll have a lot of users" without quantifying what "a lot" means. This leads to over- or under-engineering. You can't decide whether to use a single database or a sharded cluster without knowing the QPS.
✔ Fix: Always do quick back-of-envelope math. Even saying "100M users, 10 requests/day each = ~12K QPS, so we'll need multiple servers behind a load balancer" shows strong signal.
❌ Mistake 3: Over-Engineering from the Start
The candidate adds Kafka, Redis, Elasticsearch, a service mesh, Kubernetes, and a ML pipeline before they've even drawn the basic happy path. This signals that they're pattern-matching from memorized designs rather than thinking from first principles.
✔ Fix: Start with the simplest architecture that works: client → server → database. Then identify bottlenecks and add components with clear justification: "Our read QPS is 50K, so we need a cache. I'd use Redis because..."
❌ Mistake 4: Ignoring Trade-offs
The candidate presents their design as the "right answer" without acknowledging alternatives. System design is fundamentally about trade-offs—there is no single correct answer.
✔ Fix: For every major decision, briefly state the alternative and why you chose differently: "We could use a NoSQL database for flexibility, but I'm choosing SQL here because we need ACID transactions for payment processing."
❌ Mistake 5: Not Communicating Your Thought Process
The candidate draws silently for 10 minutes, then presents a completed diagram. The interviewer can't evaluate how you think, only the end result—and they're judging the process more than the output.
✔ Fix: Think out loud continuously. Narrate your reasoning: "I'm thinking about the data model now. We need fast lookups by short URL, which suggests a hash-based key. Let me consider SQL vs NoSQL..."
❌ Mistake 6: Going Too Deep Too Early
The candidate spends 20 minutes on database schema design and never gets to the high-level architecture. The interviewer wanted to see the big picture first.
✔ Fix: Follow the framework. Broad strokes first (Step 3), then offer deep dives: "I've outlined the high-level architecture. Would you like me to deep dive into the database design, caching strategy, or API design?"
Interactive: The 4-Step Design Process
Click Step to walk through each phase of the framework. Watch how each step builds on the previous one and generates inputs for the next.
▶ System Design Interview Framework
Interactive: Latency Numbers Visualized
Click Step to reveal each latency tier. The bars use a logarithmic scale—notice how each operation is often 10–100x slower than the previous one. This is why choosing the right storage and communication layer matters so much in system design.
▶ Latency Numbers Every Programmer Should Know
What’s Next
Now that you understand the fundamentals—the interview framework, the types of requirements, and how to estimate scale—you're ready to dive into the building blocks. In the next post, we'll explore Scaling: vertical vs horizontal scaling, stateless vs stateful architectures, and how to reason about when you need to scale.
- Practice the 4-step framework on 2–3 design problems this week
- Memorize the latency numbers table (or at least the order of magnitude)
- Do a back-of-envelope estimation for a system you use daily (Instagram, Gmail, Spotify)
- Practice explaining your designs out loud—even to yourself in a mirror