Fixed-length chunking requires no external services, yet semantic chunking absolutely needs an Embedding API — why? The core idea of semantic chunking is to split text at semantic boundaries. Determining whether "two pieces of text belong to the same topic" requires converting text into vectors and computing similarity — that's exactly what the Embedding API does. Dimension Fixed-Length / Recur
When you have 5 unrelated questions, should you pack them into one message to the LLM, or send 5 requests simultaneously? Which is faster? Splitting into multiple independent parallel requests is almost always faster. This isn't a gut feeling — it's determined by the underlying inference mechanism of LLMs. Let's walk through the reasoning from first principles. To understand this problem, you firs