Question 1

How would you design a URL shortener like bit.ly?

Accepted Answer

Key components: URL generation (base62 encode a unique ID or hash the URL), storage (a simple key-value store like Redis for hot URLs + MySQL for persistence), redirection (HTTP 301 for permanent, 302 for tracking), analytics (count clicks async via a message queue). Scale: use consistent hashing to distribute ID generation, CDN for edge redirects, DB read replicas. Handle collision in hashing with random suffix or retry.

Question 2

What is the difference between horizontal and vertical scaling?

Accepted Answer

Vertical scaling (scaling up) means adding more resources to a single server (more CPU, RAM). It's simple but has limits and is a single point of failure. Horizontal scaling (scaling out) means adding more servers to distribute load. It requires a load balancer, stateless services (or shared session storage), and distributed data. Most modern architectures prefer horizontal scaling for high availability and cost efficiency.

Question 3

What is a load balancer and what algorithms does it use?

Accepted Answer

A load balancer distributes incoming traffic across multiple backend servers for high availability and scalability. Algorithms: Round Robin (cyclic distribution), Weighted Round Robin (based on server capacity), Least Connections (route to server with fewest active connections), IP Hash (consistent routing for a client to the same server — useful for session affinity), and Random. L4 (TCP) vs L7 (HTTP-aware) load balancers differ in what they can inspect.

Question 4

What is caching and what are common caching strategies?

Accepted Answer

Caching stores frequently-accessed data in fast storage (Redis, Memcached) to reduce database load and latency. Strategies: Cache-aside (app checks cache, falls back to DB, writes to cache — most common), Write-through (write to cache and DB synchronously), Write-behind (write to cache, async to DB), Read-through (cache sits in front of DB). Cache invalidation (when to evict stale data) is the hardest problem.

Question 5

What is the CAP theorem?

Accepted Answer

CAP theorem states that a distributed system can guarantee at most two of three properties: Consistency (every read returns the most recent write), Availability (every request gets a response, even if not the latest data), and Partition Tolerance (system continues despite network partitions). Since partitions are inevitable in distributed systems, you choose between CP (consistent but may be unavailable during partitions — e.g., HBase) and AP (always available but may return stale data — e.g., DynamoDB, Cassandra).

Question 6

What is the difference between SQL and NoSQL databases and when do you choose each?

Accepted Answer

SQL (PostgreSQL, MySQL) enforces a schema, supports complex joins and ACID transactions — ideal when data relationships are complex and consistency is critical (banking, e-commerce orders). NoSQL: Document stores (MongoDB) for flexible schemas; Key-value (Redis) for caching and sessions; Wide-column (Cassandra) for write-heavy time-series at massive scale; Graph (Neo4j) for relationship-heavy queries. Most large systems use both (polyglot persistence).

Question 7

What is database replication and sharding?

Accepted Answer

Replication copies data to multiple servers: a primary handles writes; read replicas handle reads (eventual consistency). This improves read throughput and availability. Sharding (horizontal partitioning) splits data across multiple databases based on a shard key (user_id % n). Sharding improves write throughput and storage but complicates cross-shard queries and transactions. Replication is for availability; sharding is for scale.

Question 8

What is a message queue and when would you use one?

Accepted Answer

A message queue (Kafka, RabbitMQ, SQS) decouples producers from consumers, enabling async processing. Use cases: smooth traffic spikes by buffering requests, background jobs (email sending, report generation), event-driven microservices communication, guaranteed delivery with retries. Kafka specifically is used for high-throughput event streaming and maintaining an ordered log of events.

Question 9

What is a CDN and how does it work?

Accepted Answer

A CDN (Content Delivery Network) is a globally distributed network of edge servers that cache static assets (images, JS, CSS) close to users, reducing latency. When a user requests a file, the CDN serves it from the nearest edge node. On a cache miss, the edge fetches from the origin server and caches the response. CDNs also absorb DDoS traffic and offload origin server bandwidth.

Question 10

What is microservices architecture vs monolith?

Accepted Answer

A monolith is a single deployable unit containing all functionality — simpler to develop, test, and debug but hard to scale and deploy individual parts. Microservices decompose the system into small, independently deployable services each owning its data. Benefits: independent scaling, independent deployment, technology flexibility. Challenges: distributed system complexity (network calls, distributed transactions, service discovery, observability).

Top 10 System DesignInterview Questions & Answers (2025)