Fixed-length chunking requires no external services, yet semantic chunking absolutely needs an Embedding API — why? The core idea of semantic chunking is to split text at semantic boundaries. Determining whether "two pieces of text belong to the same topic" requires converting text into vectors and computing similarity — that's exactly what the Embedding API does. Dimension Fixed-Length / Recur
RAG stands for Retrieval Augmented Generation. Why do we even need RAG?? To answer this lets take a look at What LLMs and SLMs are. LLM(Large Language Model). Data on several categories(generalized) will be given as input. From that, a model would be created. What is a model ? To understand this, lets take mathematical equation of a straight line y = mx +c Lets take x values to be 1, 2, 3, ... a
Why Do We Need Specialized Vector Databases? In the first five articles, we figured out how to chunk documents and generate embeddings. Now where do these vectors live, and how are they efficiently retrieved? You might wonder: "Can't I just store vectors in Redis or PostgreSQL?" No — traditional databases are designed for exact queries (e.g., WHERE id = 123), while vector retrieval is Approximat
I shipped gni-compression to npm two days ago. One of the first questions I got (from myself, running benchmarks at midnight): does it work on anything other than chat data? Short answer: not yet. Long answer: I found out exactly why, and it led me somewhere more interesting than I expected. After the npm launch I ran GN against Silesia — the standard general text compression benchmark suite. Dick
Introduction Picture two doctors updating the same patient record at the same time - one in São Paulo, the other in London. Both are offline. When connectivity returns, whose changes prevail? This is not a hypothetical. It is the everyday reality of distributed systems: multiple nodes, no shared clock, no guaranteed network. The conventional answer has long been locking - one node waits while an
I keep seeing the same argument about AI making us dumber. It's the same argument people had about search engines, and before that books. The usual response is to point at history and say "every generation panics, every generation was wrong, relax." I think that response is half right, and the wrong half is what bothers me. Tools change what we bother to remember. The people who'd trained their wh
A few years ago I solved 200 LeetCode problems and still froze on Mediums I hadn't seen. The breakthrough wasn't another hundred problems. It was a different loop. A problem asks for the longest substring with at most K distinct characters. You've solved sliding window before. Maximum sum subarray of size K, done. Longest substring without repeating characters, done. This third one stalls you. Twe
Introduction Some code works. Some code lasts. The difference rarely comes down to typing speed, syntax mastery, or how many nights you're willing to push through. It comes down to how you think about a problem before you write a single line. Big-O notation is a mathematical framework that describes how an algorithm performs as its input grows. In plain terms, it answers one question: