There's a moment every developer knows. You need to generate a PDF. It looks simple. You've done harder things. Three hours later, you're reading a Stack Overflow thread from 2016 that ends with "works on my machine." This post is about that moment — the actual options, what breaks in each, and where I landed after years of hitting this in production. It uses a stripped-down WebKit engine and conv
When you have 5 unrelated questions, should you pack them into one message to the LLM, or send 5 requests simultaneously? Which is faster? Splitting into multiple independent parallel requests is almost always faster. This isn't a gut feeling — it's determined by the underlying inference mechanism of LLMs. Let's walk through the reasoning from first principles. To understand this problem, you firs