User types "android" into your search box. That's 7 API calls if you wired it the way I did the first time. A few months later I shipped a pagination bug where every infinite-scroll fetch flashed a full-screen loader for half a second. And then there was the Settings screen that needed to refresh the Dashboard when the user changed theme — but Settings couldn't import Dashboard without a circular
When you have 5 unrelated questions, should you pack them into one message to the LLM, or send 5 requests simultaneously? Which is faster? Splitting into multiple independent parallel requests is almost always faster. This isn't a gut feeling — it's determined by the underlying inference mechanism of LLMs. Let's walk through the reasoning from first principles. To understand this problem, you firs