Traditional search engines match keywords. If you search for "dog shelters around Gurgaon" and the indexed page says "animal shelters near Delhi," you get no results. The words do not overlap. Semantic search fixes this by converting text into vectors. Similar ideas end up close together in vector space, even when the words differ. An embedding model takes a word or sentence and produces a high-di
The first time I implemented Vamana from the DiskANN paper, my approximate nearest neighbor index was slower than brute force. On tiny test fixtures, brute force took 0.27 ms per query. My Vamana implementation took 22.98 ms. That sounds absurd. ANN exists to skip work. The problem was not the algorithm. It was how I mapped the paper's abstractions to actual data structures. The DiskANN pseudocode
When you have 5 unrelated questions, should you pack them into one message to the LLM, or send 5 requests simultaneously? Which is faster? Splitting into multiple independent parallel requests is almost always faster. This isn't a gut feeling — it's determined by the underlying inference mechanism of LLMs. Let's walk through the reasoning from first principles. To understand this problem, you firs