Most system design content starts with load balancers and Kafka before you’ve understood what actually happens when you type a URL. That’s backwards. Before you can design a system for a million users, you need to truly understand what a system for one user looks like — and more importantly, why that simple setup breaks the moment other people get involved.
This guide follows a specific progression. We start at the very bottom — the physics of the internet — and work up to where real architectural decisions start mattering. By the end you’ll have a mental model that makes every subsequent system design concept click into place.
01 DNS AND THE INTERNET
When you type google.com into your browser, something subtle but important happens before any data arrives. Your browser doesn’t know where Google is. The internet doesn’t work by name — it works by number. Every server has an IP address like 142.250.80.46, and your browser needs that number before it can do anything.
DNS — the Domain Name System — is how names get translated into numbers. Think of it as a phone book, except instead of one phone book, there are thousands of them arranged in a hierarchy, each knowing a little slice of the answer.
The DNS resolution chain follows the same flow every time: Your device asks a resolver (usually run by your ISP or a service like 8.8.8.8), which then hunts down the answer by working through three layers of nameservers:
- Root DNS servers (13 globally distributed)
- TLD servers (.com, .org, .io, etc.)
- Authoritative nameservers (your domain’s specific server)
The whole lookup takes around 20 milliseconds. After that, the result gets cached based on a TTL (time-to-live) value. Set TTL to 300 seconds and changes propagate in 5 minutes. This is why “DNS propagation” is a thing.
Key insight: DNS is a distributed phone book with thousands of copies. No single server holds all records. The hierarchy delegates: root tells you who handles .com, TLD tells you who handles google.com, and the authoritative server gives you the final answer.
02 REQUEST-RESPONSE
Every web interaction follows the same structure: a client asks, a server answers. Always. The client (your browser, a mobile app, a curl command) sends a request. The server processes it and sends back a response. The server doesn’t reach out unprompted — it waits.
This model is called stateless. After the server sends its response, it forgets you existed. The next request is treated as brand new. This matters enormously for scaling — a stateless server can be replicated freely because no machine holds any memory of past interactions.
HTTP Methods:
GET— Read a resource (idempotent)POST— Create a resource (not idempotent)PUT— Replace a resource (idempotent)PATCH— Partially update (not idempotent)DELETE— Remove a resource (idempotent)
Idempotent means calling it twice produces the same result as calling it once. GET twice? Same response. POST twice? Two users created. This matters when you’re handling network retries.
The four server-side steps that happen for every HTTP request:
- Receive — parse HTTP request
- Authenticate — validate token/session
- Process — run business logic, query database
- Respond — send back HTTP response
03 HTTPS AND TLS
Up until now we’ve treated the HTTP request as something that just… travels. You send it, the server gets it. But the internet is a public network. Everything you send over HTTP travels as plain readable text. Anyone who can see the network traffic — your ISP, a router admin, someone on the same café Wi-Fi — can read it.
This is the problem HTTPS solves. It’s not a different protocol — it’s HTTP with a security layer wrapped around it. That layer is TLS (Transport Layer Security).
TLS does two distinct things:
- Authenticates the server (proves you’re talking to the real
google.comand not an impostor) - Encrypts the data (scrambles it so only the intended recipient can read it)
Both happen before the first byte of actual HTTP data is sent.
Symmetric vs Asymmetric Encryption:
- Symmetric: One key both encrypts and decrypts (fast, but how do you safely share it?)
- Asymmetric: Two keys — public key encrypts, private key decrypts (slower, but the private key never leaves the server)
TLS uses both strategically. The handshake uses asymmetric encryption to safely exchange a session key. Once both sides agree on a shared session key, they switch to symmetric encryption for the actual data transfer because it’s much faster.
The SSL certificate is a small digital document signed by a trusted Certificate Authority (CA) — organisations like Let’s Encrypt, DigiCert, or Comodo that browsers inherently trust.
04 DESIGNING FOR ONE USER
For one user, everything lives on one machine. Your app server and database are on the same box. No network hop between app and database. No load balancer. No separate cache. This is what you’d run locally, or on a $5 VPS. It’s not a toy — it’s a perfectly correct architecture for its scale.
The absence of network hops between app and database is genuinely important. A local query takes microseconds. A query over a network connection takes milliseconds — a 1000× difference.
Scaling to 10 users — still fine: Ten concurrent users on the same single server? Completely manageable. A modern server handles hundreds of concurrent connections. The app server multiplexes them using threads or async I/O, and the database handles them through connection pooling.
The premature scaling trap: Most engineers reach for load balancers and microservices too early. Shopify, Stack Overflow, and Basecamp ran monoliths at enormous scale. The right time to add complexity is when you have a concrete problem the complexity solves — not when you imagine you might have one someday.
05 APPLICATION LAYER vs DATA LAYER
Here is the concept that unlocks everything else in system design. Every web system has two fundamentally different kinds of components: ones that process data, and ones that store it. The distinction matters because these two types scale completely differently.
Application Layer (Stateless — easy to scale):
- Load balancer
- Web servers (nginx, Caddy)
- App servers (Node.js, Django, Rails)
- Job workers (async tasks)
Key principle: App layer is STATELESS — no user data lives here. Add/remove servers freely without losing anything.
Data Layer (Stateful — hard to scale):
- Primary database (source of truth)
- Read replicas (async copies for scaling reads)
- Cache layer (Redis — hot data in RAM)
- Object storage (S3/GCS for files and images)
Why the distinction matters: The app layer is stateless. Every request carries enough information to be processed independently. The data layer is stateful. It holds the source of truth. You can’t just clone a database server and call it a day — you need to decide which copy is authoritative for writes, how changes propagate, how conflicts resolve.
06 THE SIX PROBLEMS AT SCALE
The problems that emerge with multiple users aren’t primarily about capacity. A single server can handle far more than 10 users. The problems are about correctness and resilience. Two users interacting with the same data at the same time produces behaviour that a single-user system never encounters.
Problem 1: Session Management HTTP is stateless. Who is logged in? Each user needs their own session stored server-side (Redis) or in a signed token (JWT). In-memory sessions on a single server work fine for one user but break the moment you add a second app server.
Problem 2: Concurrent Writes User A and B both read balance $100, both withdraw $80, both write $20 back. Balance becomes $20 instead of rejecting one. Database transactions with appropriate isolation levels prevent this.
Problem 3: File Storage User uploads a photo to Server 1 local disk. User then hits Server 2 — photo not found. Local disk doesn’t share. Fix: centralised object storage (S3, Google Cloud Storage, Cloudflare R2).
Problem 4: Single Point of Failure One server means one point of failure. When it goes down, everyone is down. Fix: redundancy — at least two app servers behind a load balancer with health checks.
Problem 5: Resource Contention A report generation endpoint that takes 10 seconds of CPU monopolises the event loop, slowing everyone else. Fix: offload to background job queues (BullMQ, Celery, Sidekiq).
Problem 6: N+1 Query Problem Fetching 10 users, then making one query per user to get their posts, is 11 queries instead of 2. For 1 user it’s invisible. For 10 it’s noticeable. For 100 it’s catastrophic. Fix: JOIN queries or batch loading.
THE FULL PICTURE
Here is everything from Phase 1 in summary:
- DNS resolves the domain name to an IP address
- Browser sends HTTP request to that IP
- Request passes through load balancer to stateless app servers
- App servers read/write from stateful database
- Redis caches hot data for speed
- S3 stores files outside the app server
- Response travels back to browser encrypted with TLS
This architecture handles tens of thousands of users without fundamental changes. The app servers scale horizontally (just add more behind the load balancer). The database scales vertically first (bigger machine, more RAM), then with read replicas.
When this architecture stops being enough — when writes saturate the primary database, when a single monolith becomes too large for one team — that’s Phase 2: horizontal scaling, replication, sharding, and eventually the question of microservices.
Next: Phase 2 tackles the problems that emerge at real scale: horizontal vs vertical scaling decisions, database replication and sharding strategies, the CAP theorem in practice, caching patterns, and how to identify and eliminate single points of failure.