← All Posts
High Level Design Series · Part 1

Introduction to System Design

System design interviews are the single biggest differentiator between junior and senior engineering candidates. While coding interviews test your ability to write correct algorithms, system design interviews test something far broader: can you take a vague, open-ended problem and architect a production-grade solution that serves millions of users?

This post is the opening chapter of a 70-part High Level Design series. We'll start from absolute fundamentals—what system design even is, the framework interviewers expect, and the math behind estimation—and build up to designing systems like YouTube, Uber, and Twitter from scratch. Whether you're preparing for FAANG interviews or simply want to become a better engineer, this series will give you a complete, structured foundation.

💡 Series Roadmap: This is Post 1 of 70. We'll cover foundations (scaling, databases, caching), building blocks (load balancers, message queues, CDNs), architecture patterns (microservices, event-driven), and finally real-world case studies (designing URL shorteners, chat systems, video streaming platforms, and more).

What is System Design?

System design is the process of defining the architecture, components, modules, interfaces, and data flows of a system to satisfy a set of specified requirements. In a professional context, it means answering: "Given these business needs, how do we build a system that works at scale, stays reliable, and can evolve over time?"

Why Tech Companies Ask System Design Questions

Coding interviews test a narrow skill: can you translate a well-defined problem into correct code? System design interviews test the skills you actually use every day as a senior engineer:

Junior vs Senior Expectations

AspectJunior (0–3 yrs)Senior (5+ yrs)
ScopeFocused on one service or componentEnd-to-end system with multiple services
RequirementsMay need heavy guidance from interviewerIndependently gathers and prioritizes requirements
EstimationRough order-of-magnitude estimatesDetailed QPS, storage, bandwidth calculations
Trade-offsIdentifies one or two trade-offsDiscusses multiple alternatives with clear reasoning
Deep DiveBasic knowledge of one areaCan deep-dive into any component (DB schema, API design, caching strategy)
Non-functionalMentions availability and scalabilityAddresses consistency models, fault tolerance, monitoring, security

System Design vs Coding Interviews

These are fundamentally different skill sets:

Coding InterviewSystem Design Interview
InputWell-defined problem statementVague, open-ended prompt
OutputWorking code that passes test casesArchitecture diagram + verbal explanation
SkillsAlgorithms, data structures, coding speedArchitecture, trade-offs, breadth of knowledge
EvaluationCorrectness + time/space complexityCompleteness, scalability, communication, trade-off reasoning
PreparationLeetCode, HackerRank, competitive programmingReading engineering blogs, designing on whiteboards, understanding infra

HLD vs LLD

System design interviews come in two flavors: High Level Design (HLD) and Low Level Design (LLD). Understanding the distinction is critical because they require fundamentally different preparation strategies.

DimensionHigh Level Design (HLD)Low Level Design (LLD)
Abstraction30,000-foot view: services, data stores, APIsCode-level: classes, interfaces, methods, relationships
OutputArchitecture diagrams, data flow diagramsUML class diagrams, code structure
ComponentsLoad balancers, databases, caches, queues, CDNsDesign patterns, SOLID principles, inheritance hierarchies
Example Question"Design Twitter" / "Design a URL shortener""Design a parking lot system" / "Design a chess game"
ScaleMillions/billions of users, distributed systemsSingle-process or single-service complexity
Key SkillsDistributed systems, databases, networking, estimationOOP, design patterns, clean code, SOLID
Interview LevelSenior / Staff / PrincipalJunior / Mid / Senior
Time35–60 minutes30–45 minutes

What HLD Covers

In a high-level design round, you're expected to discuss:

What LLD Covers

In a low-level design round, you're expected to discuss:

💡 Key Insight: This entire series focuses on HLD. If you're looking for LLD content (design patterns, OOP, SOLID), check out the separate Low Level Design series. Many companies ask both in the same interview loop, so you'll want to prepare for both.

The 4-Step Framework

Every system design interview should follow a structured approach. Interviewers expect this structure, and deviating from it is one of the most common reasons candidates fail. The framework below works for virtually every HLD question, from "Design a URL Shortener" to "Design YouTube."

Step 1: Understand Requirements & Scope (5–8 minutes)

Before drawing a single box, you must clarify what you're building. This is not optional—it's the most important step. Ask questions to nail down:

  • Functional requirements — What features does the system support? (e.g., "Users can create short URLs and be redirected to the original URL")
  • Non-functional requirements — What are the quality attributes? (e.g., "99.99% availability, <100ms redirect latency, 100M URLs created per day")
  • Scope boundaries — What is explicitly out of scope? (e.g., "We won't handle analytics for now")
  • User base — Who are the users? How many? What's the read-to-write ratio?

Write the requirements down visibly. This anchors your design and gives the interviewer confidence you're methodical.

Step 2: Back-of-Envelope Estimation (3–5 minutes)

Use rough math to quantify the scale of your system. This step informs critical design decisions:

  • Traffic estimation — Queries per second (QPS) for reads and writes
  • Storage estimation — How much data over 5 years?
  • Bandwidth estimation — Incoming and outgoing data per second
  • Memory estimation — Can we cache the hot data? How much RAM?

Don't aim for precision—aim for the right order of magnitude. If you estimate 10K QPS and the real number is 15K, it doesn't matter. If you estimate 10K and it's actually 10M, your architecture will be fundamentally different.

Step 3: High-Level Design (10–15 minutes)

Now draw the architecture. Start with the core flow and add components as needed:

  • API design — Define the key endpoints (REST or gRPC)
  • Service architecture — Draw boxes for each service/component
  • Data model — Define the database schema and choose storage type
  • Data flow — Trace a request from client to server and back with arrows

Start simple. A client, a web server, and a database is a perfectly valid first draft. Then ask yourself: "What breaks when this gets 1000x traffic?" and add load balancers, caches, queues, etc.

Step 4: Deep Dive & Trade-offs (10–15 minutes)

The interviewer will pick 1–2 areas to probe. Be ready to go deep on any component:

  • Bottleneck identification — Where does the system fail under load?
  • Scaling strategies — Horizontal vs vertical scaling, sharding, replication
  • Caching — What to cache, eviction policies, cache invalidation
  • Consistency vs availability — CAP theorem trade-offs
  • Failure handling — What happens when a service goes down? Retry logic, circuit breakers, graceful degradation
  • Monitoring & alerting — How do you know when something's wrong?

Always frame your answers as trade-offs: "We could use X, which gives us A but costs us B. Alternatively, Y gives us C but costs us D. Given our requirements, I'd choose X because..."

Functional vs Non-Functional Requirements

Every system has two kinds of requirements. Confusing them or forgetting one category is a classic interview mistake.

Functional Requirements (FRs)

Functional requirements describe what the system does—the features and behaviors visible to users.

Non-Functional Requirements (NFRs)

Non-functional requirements describe how well the system performs—quality attributes that aren't features themselves but constrain the design.

NFRDefinitionTypical TargetDesign Implication
ScalabilitySystem handles increasing load10x traffic in 2 yearsHorizontal scaling, sharding, load balancing
AvailabilitySystem is operational when needed99.99% uptime (52 min downtime/year)Redundancy, failover, multi-region deployment
ConsistencyAll users see the same data at the same timeStrong or eventual (depends on use case)Consensus protocols, replication strategy
LatencyTime to process a single requestp99 < 200msCaching, CDN, efficient queries, proximity
DurabilityData is not lost once written99.999999999% (11 nines)Replication, backups, write-ahead logs
ThroughputRequests processed per unit time50K QPSAsync processing, partitioning, batching
SecurityProtection against unauthorized accessVaries by domainEncryption, auth, rate limiting, input validation

Example: URL Shortener Requirements

Let's walk through a concrete example. Suppose the interviewer says: "Design a URL shortening service like bit.ly."

💡 Functional Requirements:
  • Given a long URL, generate a unique short URL
  • When a user visits the short URL, redirect to the original long URL
  • Short URLs should expire after a configurable TTL (optional)
  • Users can choose a custom alias (optional)
💡 Non-Functional Requirements:
  • Availability: 99.99% — Users should always be able to resolve short URLs
  • Latency: Redirect in <50ms (read-heavy, latency-critical)
  • Scalability: 100M new URLs/day, 10:1 read-to-write ratio = 1B redirects/day
  • Durability: Once created, a URL mapping must never be lost
  • Consistency: Eventual consistency is acceptable (a few seconds delay after creation is fine)

Notice how these requirements immediately drive design decisions. High availability + low latency + read-heavy workload = caching is essential. Eventual consistency = we can use async replication. 100M writes/day ≈ 1,160 QPS writes—a single beefy database can handle this, but we'll want replication for availability.

Back-of-Envelope Estimation

Back-of-envelope estimation is a skill that separates strong candidates from weak ones. You don't need exact numbers—you need the ability to quickly compute whether your system needs 1 server or 1,000, 10 GB or 10 TB. The key tools are powers of 2, standard latency numbers, and a systematic calculation approach.

Powers of 2: Data Units

PowerExact ValueApprox.NameShort
2101,024~1 ThousandKilobyte1 KB
2201,048,576~1 MillionMegabyte1 MB
2301,073,741,824~1 BillionGigabyte1 GB
2401,099,511,627,776~1 TrillionTerabyte1 TB
2501,125,899,906,842,624~1 QuadrillionPetabyte1 PB
💡 Quick Conversions: 1 char = 1 byte (ASCII) or 2–4 bytes (UTF-8 extended). A typical URL ≈ 100 bytes. A typical JSON API response ≈ 1–10 KB. A photo ≈ 200 KB–5 MB. A 1-minute video ≈ 5–50 MB depending on resolution.

Latency Numbers Every Programmer Should Know

These are approximate numbers that help you reason about where time is spent in your system. They're based on Jeff Dean's famous list, updated for modern hardware (circa 2024):

OperationLatencyNotes
L1 cache reference0.5 nsFastest memory access possible
Branch mispredict5 nsCPU pipeline flush penalty
L2 cache reference7 ns~14x L1 cache
Mutex lock/unlock25 nsContention can increase this dramatically
Main memory (RAM) reference100 ns~200x L1 cache
Compress 1 KB with Zippy3,000 ns (3 µs)Snappy/LZ4 compression
Send 1 KB over 1 Gbps network10,000 ns (10 µs)Within data center
Read 4 KB randomly from SSD150,000 ns (150 µs)~1,000x memory access
Read 1 MB sequentially from memory250,000 ns (250 µs)Fast sequential access
Round trip within same datacenter500,000 ns (0.5 ms)Network hop + processing
Read 1 MB sequentially from SSD1,000,000 ns (1 ms)~4x sequential memory read
HDD seek10,000,000 ns (10 ms)Moving disk head is slow
Read 1 MB sequentially from HDD20,000,000 ns (20 ms)~20x SSD sequential
Send packet CA → NL → CA150,000,000 ns (150 ms)Speed of light is the bottleneck

The key takeaways from these numbers:

Traffic Estimation Walkthrough

Let's estimate the traffic for our URL shortener:

// Given
New URLs per day         = 100M
Read:Write ratio         = 10:1

// Write QPS (Queries Per Second)
Write QPS = 100M / (24 × 3600)
         = 100,000,000 / 86,400
         ≈ ~1,160 writes/sec

// Read QPS
Read QPS  = 1,160 × 10
         = ~11,600 reads/sec

// Peak QPS (assume 2x average)
Peak Write QPS = ~2,300 writes/sec
Peak Read QPS = ~23,200 reads/sec

Storage Estimation Walkthrough

// Each URL record
Short URL (7 chars)    = 7 bytes
Long URL (avg)        = 200 bytes
Created timestamp     = 8 bytes
Expiry timestamp      = 8 bytes
User ID              = 8 bytes
Total per record      ≈ ~250 bytes

// Storage for 5 years
URLs per year   = 100M × 365 = 36.5B
URLs in 5 years = 36.5B × 5  = 182.5B

Storage = 182.5B × 250 bytes
       = 45.6 × 1012 bytes
       = ~45.6 TB

Bandwidth Estimation Walkthrough

// Incoming (write) bandwidth
Write bandwidth = 1,160 writes/sec × 250 bytes
              = 290,000 bytes/sec
              = ~290 KB/s (trivial)

// Outgoing (read) bandwidth
Read bandwidth = 11,600 reads/sec × 250 bytes
             = 2,900,000 bytes/sec
             = ~2.9 MB/s (still modest)

QPS Calculation: A General Formula

💡 Formula: QPS = (Daily Active Users × Avg. Requests per User) / 86,400
Peak QPS ≈ 2 × QPS (for most applications)
Peak QPS ≈ 3–5 × QPS (for bursty workloads like flash sales)

The number 86,400 (seconds in a day) is one you should memorize. A useful shortcut: 100K daily ≈ 1 QPS. So 100M daily events = ~1,000 QPS. This gives you a quick sanity check for any estimation.

Key Concepts Preview

This series will cover dozens of topics in depth. Here's a preview of the core concepts you'll master, with brief definitions so you have a mental map before we dive in:

Vertical Scaling Adding more CPU, RAM, or disk to a single machine. Simple but has hard limits.
Horizontal Scaling Adding more machines to a pool. Complex but virtually unlimited.
Load Balancing Distributing incoming requests across multiple servers to prevent any single server from being overwhelmed.
Caching Storing frequently accessed data in a fast layer (RAM) to avoid repeated expensive computations or database lookups.
Database Sharding Splitting a large database into smaller, faster, more manageable pieces called shards, each holding a subset of data.
Replication Keeping copies of data on multiple machines for fault tolerance and read performance.
CAP Theorem In a distributed system, you can guarantee at most 2 of 3: Consistency, Availability, Partition Tolerance.
Consistent Hashing A technique for distributing data across nodes that minimizes re-distribution when nodes are added/removed.
Message Queues Asynchronous communication between services via a broker (Kafka, RabbitMQ, SQS) for decoupling and reliability.
CDN Content Delivery Network—edge servers that cache static content close to users for low latency.
API Gateway Single entry point for all client requests that handles auth, rate limiting, routing, and load balancing.
Database Indexing Data structures (B-trees, hash indexes) that speed up queries at the cost of extra storage and slower writes.

Each of these topics gets its own dedicated post (or multiple posts) in this series. By the end, you'll understand not just what each concept is, but when and why to use it in a design.

Common Mistakes in System Design Interviews

Having conducted and observed hundreds of system design interviews, here are the patterns that consistently trip candidates up—and how to avoid each one.

❌ Mistake 1: Jumping Straight Into Design

The candidate immediately starts drawing boxes and arrows without understanding what they're building. They design a chat system when the interviewer wanted a notification system.

✔ Fix: Spend the first 5–8 minutes only asking clarifying questions and writing down requirements. Confirm with the interviewer: "So we're building X with these features, at this scale. Is that right?"

❌ Mistake 2: Skipping Estimation

The candidate says "we'll have a lot of users" without quantifying what "a lot" means. This leads to over- or under-engineering. You can't decide whether to use a single database or a sharded cluster without knowing the QPS.

✔ Fix: Always do quick back-of-envelope math. Even saying "100M users, 10 requests/day each = ~12K QPS, so we'll need multiple servers behind a load balancer" shows strong signal.

❌ Mistake 3: Over-Engineering from the Start

The candidate adds Kafka, Redis, Elasticsearch, a service mesh, Kubernetes, and a ML pipeline before they've even drawn the basic happy path. This signals that they're pattern-matching from memorized designs rather than thinking from first principles.

✔ Fix: Start with the simplest architecture that works: client → server → database. Then identify bottlenecks and add components with clear justification: "Our read QPS is 50K, so we need a cache. I'd use Redis because..."

❌ Mistake 4: Ignoring Trade-offs

The candidate presents their design as the "right answer" without acknowledging alternatives. System design is fundamentally about trade-offs—there is no single correct answer.

✔ Fix: For every major decision, briefly state the alternative and why you chose differently: "We could use a NoSQL database for flexibility, but I'm choosing SQL here because we need ACID transactions for payment processing."

❌ Mistake 5: Not Communicating Your Thought Process

The candidate draws silently for 10 minutes, then presents a completed diagram. The interviewer can't evaluate how you think, only the end result—and they're judging the process more than the output.

✔ Fix: Think out loud continuously. Narrate your reasoning: "I'm thinking about the data model now. We need fast lookups by short URL, which suggests a hash-based key. Let me consider SQL vs NoSQL..."

❌ Mistake 6: Going Too Deep Too Early

The candidate spends 20 minutes on database schema design and never gets to the high-level architecture. The interviewer wanted to see the big picture first.

✔ Fix: Follow the framework. Broad strokes first (Step 3), then offer deep dives: "I've outlined the high-level architecture. Would you like me to deep dive into the database design, caching strategy, or API design?"

💡 The Meta-Skill: System design interviews are as much about communication and structure as they are about technical knowledge. A candidate with moderate knowledge who communicates well will often outperform a deeply technical candidate who can't explain their thinking.

Interactive: The 4-Step Design Process

Click Step to walk through each phase of the framework. Watch how each step builds on the previous one and generates inputs for the next.

▶ System Design Interview Framework

STEP 1 Requirements STEP 2 Estimation STEP 3 Design STEP 4 Deep Dive • Functional Requirements • Non-Functional Requirements • Scope & Constraints • User base & traffic patterns • QPS (read & write) • Storage over N years • Bandwidth in/out • Memory for caching • API design • Service architecture • Data model & schema • Data flow diagram • Bottlenecks • Scaling strategies • Trade-off analysis • Failure handling 5–8 min 3–5 min 10–15 min 10–15 min

Interactive: Latency Numbers Visualized

Click Step to reveal each latency tier. The bars use a logarithmic scale—notice how each operation is often 10–100x slower than the previous one. This is why choosing the right storage and communication layer matters so much in system design.

▶ Latency Numbers Every Programmer Should Know

L1 cache ref L2 cache ref RAM reference SSD random read (4KB) HDD seek DC round trip CA → NL → CA 0.5 ns 7 ns 100 ns 150 µs 10 ms 0.5 ms 150 ms Logarithmic scale (each bar ≈ 10–1000x the previous)

What’s Next

Now that you understand the fundamentals—the interview framework, the types of requirements, and how to estimate scale—you're ready to dive into the building blocks. In the next post, we'll explore Scaling: vertical vs horizontal scaling, stateless vs stateful architectures, and how to reason about when you need to scale.

💡 Action Items:
  • Practice the 4-step framework on 2–3 design problems this week
  • Memorize the latency numbers table (or at least the order of magnitude)
  • Do a back-of-envelope estimation for a system you use daily (Instagram, Gmail, Spotify)
  • Practice explaining your designs out loud—even to yourself in a mirror