Caching is the single highest-leverage performance technique in distributed systems. One Redis key can absorb 100,000 database reads per second. Here's exactly how to use it — and how to avoid the failure modes that take down production systems.
What's actually happening here?
A cache is a faster storage layer that sits between your application and your database. When your app needs data, it checks the cache first. If the data is there (cache hit), it returns immediately — typically in under 1ms. If not (cache miss), it fetches from the database, stores a copy in the cache, and returns it. Every subsequent request for the same data skips the database entirely.
The speed difference is the whole story: Redis returns data in ~0.1ms. Postgres returns data in ~5–50ms depending on query complexity and whether the page is in its buffer pool. At 10,000 requests per second, that difference is the line between a system that copes and one that melts.
The problem this solves
Databases are expensive to scale — adding capacity means bigger machines, more replicas, or complex sharding. Caches are cheap to scale — Redis is in-memory, handles 1 million operations per second on a single node, and costs a fraction of a comparable database instance. The fundamental insight: most real-world traffic is not uniformly distributed. A small percentage of your data gets the overwhelming majority of reads. Caching exactly that small percentage eliminates the load on everything behind it.
How it really works (step by step)

The cache-aside pattern — the most common in production:
Request arrives — user asks for restaurant menu ID 4821.
App checks cache first —
GET menu:4821in Redis. Takes ~0.1ms.Cache hit — Redis has it. Return immediately. Database never touched.
Cache miss — Redis doesn't have it. App queries the database. Takes ~10ms.
Store in cache — app writes the result to Redis with a TTL:
SET menu:4821 {data} EX 300(expires in 300 seconds).Return to user — same response as a cache hit, just 10ms slower this once.
All subsequent requests — hit the cache for the next 300 seconds. Database sees one query, not thousands.
The part most tutorials skip
Cache stampede is the failure mode that takes down production systems at peak traffic. Here's the scenario: a popular cache key expires at exactly the moment 10,000 users are requesting it simultaneously. All 10,000 requests see a cache miss at the same moment. All 10,000 hit the database simultaneously. The database — designed to handle perhaps 500 concurrent queries — falls over. The cache was supposed to protect it, but the expiry created a thundering herd.
Three production-grade fixes:
Probabilistic early expiration — instead of expiring at exactly T=300s, each request has a small random chance of refreshing the cache slightly before it expires. The first request that rolls "refresh" rebuilds the cache while everyone else still gets the cached value. No stampede.
Cache locking — when a miss is detected, the first request acquires a Redis lock and rebuilds the cache. All other concurrent requests wait briefly and then get the freshly populated value. Only one database query fires.
Background refresh — a separate worker refreshes popular cache keys before they expire, so they're never actually missing. Used by Netflix, Facebook, and Cloudflare for their highest-traffic data.
Real company doing this right now

Meta's Memcached fleet handles over 1 billion cache operations per second globally. Their cache hierarchy has three layers: an in-process L1 cache inside each web server (microsecond reads, no network), a regional Memcached cluster (sub-millisecond reads), and MySQL at the bottom. The L1 cache alone absorbs the majority of all traffic — most requests never even reach Memcached. The key insight Meta published: the working set for a social network feed is extremely small — a few thousand "celebrity" accounts generate the majority of all read traffic. Caching their posts aggressively eliminates the vast majority of database load. For everyone else, the cache is still helpful but less critical.
What breaks at scale?
The hot key problem. A single cache key receiving millions of requests per second overloads the one Redis shard that owns it — even though every other shard is idle. This happens during breaking news (one article), IPL scores (one live match), or a celebrity posting (one user's timeline). The fix is local replication: copy the hot key to every Redis node in the cluster, then route hot-key reads to a random node. Reads spread across all nodes; writes still go to the owner and propagate. Instagram and Twitter both implemented versions of this for celebrity accounts during major events.
The "aha" moment
A cache doesn't make your database faster — it makes your database irrelevant for the data that gets read the most. The database still runs at the same speed. You just stop asking it the same question 10,000 times.
Your practical takeaway
Add a TTL to every cache key without exception — a cache with no TTL eventually serves stale data forever. Even for "stable" data like restaurant menus or product catalogues, set a TTL of at least 24 hours. This forces a periodic freshness check and prevents stale data accumulating silently.
Use cache-aside for read-heavy data, write-through for data that must stay consistent — cache-aside is simpler and works for 90% of cases. Write-through (writing to cache and DB simultaneously on every write) is only worth the complexity when you can't tolerate even brief stale reads.
Monitor your cache hit rate before optimising anything else — a hit rate below 90% means your TTLs are too short, your key space is too large, or you're caching the wrong data. Fix the hit rate first; it's the leading indicator of whether your cache is actually helping.
Lesson 08 · Stage 2 — Storage Architecture · System Design Made Easy
