This section is the map for the rest of the book. The five stages introduced in the 1.1 chapter overview (parse, analyze/rewrite, plan, portal, execute) are traced here through the actual code: which functions implement each stage, and in what order they get called. The mechanics of each of the five stages are unpacked in later chapters. Here, only the skeleton matters: how a backend starts up, ho
PostgreSQL Internals · Chapter 1 Query Processing Suppose a client sends SELECT * FROM users WHERE id = 1. The path that single line travels before coming back as a result row is longer than you might expect. Inside the PostgreSQL backend, that SQL goes through a five-stage pipeline. Backend entry and dispatch. The backend receives the message from the client and decides which processing path it s
I shipped gni-compression to npm two days ago. One of the first questions I got (from myself, running benchmarks at midnight): does it work on anything other than chat data? Short answer: not yet. Long answer: I found out exactly why, and it led me somewhere more interesting than I expected. After the npm launch I ran GN against Silesia — the standard general text compression benchmark suite. Dick
Introduction Picture two doctors updating the same patient record at the same time - one in São Paulo, the other in London. Both are offline. When connectivity returns, whose changes prevail? This is not a hypothetical. It is the everyday reality of distributed systems: multiple nodes, no shared clock, no guaranteed network. The conventional answer has long been locking - one node waits while an
I keep seeing the same argument about AI making us dumber. It's the same argument people had about search engines, and before that books. The usual response is to point at history and say "every generation panics, every generation was wrong, relax." I think that response is half right, and the wrong half is what bothers me. Tools change what we bother to remember. The people who'd trained their wh
A few years ago I solved 200 LeetCode problems and still froze on Mediums I hadn't seen. The breakthrough wasn't another hundred problems. It was a different loop. A problem asks for the longest substring with at most K distinct characters. You've solved sliding window before. Maximum sum subarray of size K, done. Longest substring without repeating characters, done. This third one stalls you. Twe
Introduction Some code works. Some code lasts. The difference rarely comes down to typing speed, syntax mastery, or how many nights you're willing to push through. It comes down to how you think about a problem before you write a single line. Big-O notation is a mathematical framework that describes how an algorithm performs as its input grows. In plain terms, it answers one question:
The first time I implemented Vamana from the DiskANN paper, my approximate nearest neighbor index was slower than brute force. On tiny test fixtures, brute force took 0.27 ms per query. My Vamana implementation took 22.98 ms. That sounds absurd. ANN exists to skip work. The problem was not the algorithm. It was how I mapped the paper's abstractions to actual data structures. The DiskANN pseudocode