[Day 1] DGX Spark Came Home — I Made It Draw a Cat So... what is "local LLM" again? Honestly, I'm still figuring out what "local LLM" even means. But somehow, through a series of decisions I won't fully justify here, I ended up buying an NVIDIA DGX Spark — and now it's sitting in my house. DGX Spark: NVIDIA's "supercomputer for the home" — a small but seriously expensive box with the
When you have 5 unrelated questions, should you pack them into one message to the LLM, or send 5 requests simultaneously? Which is faster? Splitting into multiple independent parallel requests is almost always faster. This isn't a gut feeling — it's determined by the underlying inference mechanism of LLMs. Let's walk through the reasoning from first principles. To understand this problem, you firs