Question 1

How would you design a URL shortener like bit.ly?

Accepted Answer

Key components: URL generation (base62 encode a unique ID or hash the URL), storage (a simple key-value store like Redis for hot URLs + MySQL for persistence), redirection (HTTP 301 for permanent, 302 for tracking), analytics (count clicks async via a message queue). Scale: use consistent hashing to distribute ID generation, CDN for edge redirects, DB read replicas. Handle collision in hashing with random suffix or retry.

Question 2

What is the difference between horizontal and vertical scaling?

Accepted Answer

Vertical scaling (scaling up) means adding more resources to a single server (more CPU, RAM). It's simple but has limits and is a single point of failure. Horizontal scaling (scaling out) means adding more servers to distribute load. It requires a load balancer, stateless services (or shared session storage), and distributed data. Most modern architectures prefer horizontal scaling for high availability and cost efficiency.

Question 3

What is a load balancer and what algorithms does it use?

Accepted Answer

A load balancer distributes incoming traffic across multiple backend servers for high availability and scalability. Algorithms: Round Robin (cyclic distribution), Weighted Round Robin (based on server capacity), Least Connections (route to server with fewest active connections), IP Hash (consistent routing for a client to the same server — useful for session affinity), and Random. L4 (TCP) vs L7 (HTTP-aware) load balancers differ in what they can inspect.

Question 4

What is caching and what are common caching strategies?

Accepted Answer

Caching stores frequently-accessed data in fast storage (Redis, Memcached) to reduce database load and latency. Strategies: Cache-aside (app checks cache, falls back to DB, writes to cache — most common), Write-through (write to cache and DB synchronously), Write-behind (write to cache, async to DB), Read-through (cache sits in front of DB). Cache invalidation (when to evict stale data) is the hardest problem.

Question 5

What is the CAP theorem?

Accepted Answer

CAP theorem states that a distributed system can guarantee at most two of three properties: Consistency (every read returns the most recent write), Availability (every request gets a response, even if not the latest data), and Partition Tolerance (system continues despite network partitions). Since partitions are inevitable in distributed systems, you choose between CP (consistent but may be unavailable during partitions — e.g., HBase) and AP (always available but may return stale data — e.g., DynamoDB, Cassandra).

Question 6

What is the difference between SQL and NoSQL databases and when do you choose each?

Accepted Answer

SQL (PostgreSQL, MySQL) enforces a schema, supports complex joins and ACID transactions — ideal when data relationships are complex and consistency is critical (banking, e-commerce orders). NoSQL: Document stores (MongoDB) for flexible schemas; Key-value (Redis) for caching and sessions; Wide-column (Cassandra) for write-heavy time-series at massive scale; Graph (Neo4j) for relationship-heavy queries. Most large systems use both (polyglot persistence).

Question 7

What is database replication and sharding?

Accepted Answer

Replication copies data to multiple servers: a primary handles writes; read replicas handle reads (eventual consistency). This improves read throughput and availability. Sharding (horizontal partitioning) splits data across multiple databases based on a shard key (user_id % n). Sharding improves write throughput and storage but complicates cross-shard queries and transactions. Replication is for availability; sharding is for scale.

Question 8

What is a message queue and when would you use one?

Accepted Answer

A message queue (Kafka, RabbitMQ, SQS) decouples producers from consumers, enabling async processing. Use cases: smooth traffic spikes by buffering requests, background jobs (email sending, report generation), event-driven microservices communication, guaranteed delivery with retries. Kafka specifically is used for high-throughput event streaming and maintaining an ordered log of events.

Question 9

What is a CDN and how does it work?

Accepted Answer

A CDN (Content Delivery Network) is a globally distributed network of edge servers that cache static assets (images, JS, CSS) close to users, reducing latency. When a user requests a file, the CDN serves it from the nearest edge node. On a cache miss, the edge fetches from the origin server and caches the response. CDNs also absorb DDoS traffic and offload origin server bandwidth.

Question 10

What is microservices architecture vs monolith?

Accepted Answer

A monolith is a single deployable unit containing all functionality — simpler to develop, test, and debug but hard to scale and deploy individual parts. Microservices decompose the system into small, independently deployable services each owning its data. Benefits: independent scaling, independent deployment, technology flexibility. Challenges: distributed system complexity (network calls, distributed transactions, service discovery, observability).

Question 11

How do microservices communicate?

Accepted Answer

Synchronous: REST over HTTP (simple, widely understood) or gRPC (binary, faster, typed contracts via Protocol Buffers — preferred for internal service-to-service). Asynchronous: message brokers (Kafka, RabbitMQ) for fire-and-forget or event-driven patterns. Async decouples services and improves resilience. Use sync for low-latency client-facing requests; async for background processing, high-throughput, and when strong decoupling matters.

Question 12

What is a rate limiter and how would you design one?

Accepted Answer

A rate limiter restricts the number of requests a client can make in a time window. Algorithms: Fixed Window (simple but allows bursts at window boundaries), Sliding Window Log (accurate but memory-intensive), Token Bucket (tokens accumulate at rate r, each request consumes one — allows short bursts), Leaky Bucket (requests queue and drain at a fixed rate). For distributed systems, store state in Redis with atomic Lua scripts or the INCR+EXPIRE pattern.

Question 13

What is consistent hashing?

Accepted Answer

Consistent hashing maps both data and nodes to a virtual ring (0 to 2^32). A key is assigned to the nearest node clockwise on the ring. When a node is added or removed, only the keys on that node's segment are redistributed (1/N keys on average), unlike naive modulo hashing where almost all keys move. Used in distributed caches and databases (Cassandra, DynamoDB) to minimise data movement during scaling.

Question 14

How would you design a notification system?

Accepted Answer

Components: notification service that receives events, a fan-out service to determine recipients, channel handlers (push, email, SMS, in-app), a message queue between them for async delivery, a user preferences service to filter channels, and a dedupe layer. At scale, Kafka fans out events to per-channel workers. Store notification history in a DB for in-app retrieval. Handle retries with exponential backoff and dead-letter queues.

Question 15

What is an API gateway?

Accepted Answer

An API gateway is a single entry point for all client requests to backend services. It handles cross-cutting concerns: authentication/authorisation, rate limiting, SSL termination, request routing, protocol translation, response caching, and logging. Examples: AWS API Gateway, Kong, NGINX. It simplifies clients (one endpoint instead of many) and moves infrastructure concerns out of individual services.

Question 16

What is eventual consistency vs strong consistency?

Accepted Answer

Strong consistency guarantees that after a write completes, all subsequent reads will return the updated value — requires coordination across replicas and reduces availability. Eventual consistency guarantees that given enough time without new updates, all replicas will converge to the same value — higher availability but reads may return stale data. Systems like DynamoDB and Cassandra offer tunable consistency levels between the two extremes.

Top 40 System Design Interview Questions & Answers (2025)