Browse by category: GPU Cluster Operations · AI Infrastructure Economics · AI Infrastructure Security · AI Power Systems · MEP and Cooling Resilience · NeoCloud Operations and Compliance · Resilience Engineering · Infrastructure Leadership — or search by Tags
Local LLM Bench: Best Model for Coding Swarms
In Part 1, we established the baseline: MoE delivers 168 tok/s on a single RTX 3090, 4.1x faster than Dense. Clean single-request numbers. One prompt in, one response out. That’s not how swarms work. An orchestrator like Claude Code dispatches four coding tasks simultaneously. The local model serves all four. Under concurrency, memory bandwidth saturates, per-task throughput drops, and the architecture of the model — not the GPU, the model — determines whether you get useful parallelism or just contention. ...