Articles - GPU Fleet Ops & Resilience Notes

Browse by category: GPU Cluster Operations · AI Infrastructure Economics · AI Infrastructure Security · AI Power Systems · MEP and Cooling Resilience · NeoCloud Operations and Compliance · Resilience Engineering · Infrastructure Leadership — or search by Tags

Local LLM Bench: Best Model for Coding Swarms

In Part 1, we established the baseline: MoE delivers 168 tok/s on a single RTX 3090, 4.1x faster than Dense. Clean single-request numbers. One prompt in, one response out. That’s not how swarms work. An orchestrator like Claude Code dispatches four coding tasks simultaneously. The local model serves all four. Under concurrency, memory bandwidth saturates, per-task throughput drops, and the architecture of the model — not the GPU, the model — determines whether you get useful parallelism or just contention. ...

The Heat Nobody Counts - PUE Ends at the Meter

Meta’s Prometheus data center in New Albany, Ohio is scaling to 1.2 GW. To get there, they’re building behind-the-meter natural gas turbines — two 200 MW Socrates generation facilities, supplied by dedicated gas pipelines, isolated from the grid. In Virginia, the same story plays out with diesel generators, enough of them that it became the top legislative concern entering the 2026 session. The industry talks about PUE as if it were a verdict on environmental efficiency. It isn’t. PUE measures one envelope — the data center facility. Total facility power divided by IT equipment power. A PUE of 1.3 means 30% overhead for cooling, lighting, and support systems. That’s the metric everyone optimizes, the number that shows up in sustainability reports, the figure that earns applause at conferences. ...

Local LLM Bench: MoE vs Dense on One RTX 3090

I went looking for sustained-load benchmarks comparing MoE and Dense coding models on consumer GPUs. Not demo bursts on a Mac Mini — sustained autoregressive generation on real coding tasks, where architecture and interconnect are the only variables. I found plenty of one-shot numbers. Nobody had published the comparison that matters: same hardware, same quantization, same inference engine, MoE versus Dense, across GPU configurations. Methodology visible. Numbers verifiable. So I ran the tests. Dual RTX 3090s with NVLink, custom liquid cooling, a 6 kW isolation transformer feeding a double-conversion UPS. Not elegant, but thermally and electrically honest — sustained inference loads without throttling, no measurement fiction. The hardware details are below. ...

Kudos to Anthropic - Governments Bury Ecosystems

Last Friday, the White House ordered every federal agency to stop using Anthropic products within six months. The Defense Secretary designated the company a “supply chain risk to national security” — a label normally reserved for foreign adversaries like Huawei or Kaspersky. Anthropic’s crime: they refused to remove two safety guardrails from Claude before deploying it on classified Pentagon networks. No AI for mass domestic surveillance of American citizens. No fully autonomous weapons without human oversight. ...

Everybody Spies: Sovereignty and the AI Land Grab

In Brazil, when advising a customer on endpoint security, there was a mental model we never said out loud. The technical discussion would cover detection rates, false positives, memory footprint — the usual. But underneath it ran a question that never made it into the RFP: who do you want knowing what you’re doing? Russians or Americans? Kaspersky was the default for most of the market — and not because of ideology. Norton and Symantec had spent years earning their reputation for turning Windows machines into molasses, and McAfee was McAfee. Kaspersky worked. It was lighter, faster, cheaper. The fact that its telemetry flowed to Moscow rather than Langley was a feature, not a bug, depending on which side of the table you sat on. ...

The Concorde Problem in AI Infrastructure

The Concorde burned one ton of fuel per passenger to cross the Atlantic. One hundred seats. Three and a half hours. Mach 2. The most advanced commercial aircraft ever built — and every engineer who saw it wanted to believe it was the future. The 747 did the same crossing in seven hours. Four hundred seats. A quarter of the fuel per passenger. No afterburners. No sonic boom. No government subsidies keeping it alive. ...

Building Trust in Security: Part 3

This is the third and final part of a series based on a real-world engagement: a company that scaled from $40M to $1B in annual revenue in just five years, and the security program that had to grow with it. This is a story about building high-performance operating systems where security, standards, architecture, and performance act as enablers rather than constraints. Part 1: Earning credibility before you’ve earned authority. Part 2: Blurring the lines — Security at the SRE and Operations level. Part 3: Wrapping the gift — Transparency and agency. The Quality That Can’t Be Purchased I’ve been writing around this idea for a while — in Cold Aisle Trenches, in why standards fail when you try to impose them, in how defense in depth actually works at scale. The thread is always the same: security can’t be bought. You can’t swipe a credit card and receive “secure” in a box. It’s a quality that emerges — like the lights-out data center you don’t chase but eventually arrive at, because every other piece fell into place first. ...

Building Trust in Security: Part 2

This is the second of a three-part series based on a real-world engagement: a company that scaled from $40M to $1B in annual revenue in just five years, and the security program that had to grow with it. This is a story about building high-performance operating systems where security, standards, architecture, and performance act as enablers rather than constraints. Part 1: Earning credibility before you’ve earned authority. Part 2: Blurring the lines - Security at the SRE and Operations level. Part 3: Wrapping the gift — Transparency and agency. From Trust to Reliance ...

Building Trust in Security: Part 1

This is the first of a three-part series based on a real-world engagement: a company that scaled from $40M to $1B in annual revenue in just five years, and the security program that had to grow with it. This is a story about building high-performance operating systems where security, standards, architecture, and performance act as enablers rather than constraints. Part 1: Earning credibility before you’ve earned authority. Part 2: Blurring the lines - Security at the SRE and Operations level. Part 3: Wrapping the gift — Transparency and agency. The Inflection Point A few years back, AMTI was at the heart of a fascinating corporate challenge. I was serving as a fractional CISO and advisor for a company standing at a critical inflection point. ...

Why Foreign AI Specialists Keep Failing

Context got commoditized. Translation is next. When my company’s acquisition closed in 2024, I thought about pursuing a psychology degree in the US. The impulse was the same one that drives URE: wanting to understand how things are wired under the hood. My wife shut it down—“Really? You know that’s not going to work”—and she was right, though neither of us fully understood why at the time. What I was actually chasing wasn’t psychology. It was context. ...

Cold Aisle Trenches: When Theory Hits the Asphalt

A bricked storage array, a 2+4 SLA that technically performed, and a technician asking about lunch while executives circled. We learned that risk transfer is an illusion when your blood is on the floor. January 2026 · Stefano Schotten The contract was honored. The business still bled. My case manager called me from the customer site. I could hear the tension before he said a word. “The VPs are pacing. Four of them, maybe five. They’re all just… standing around IT, watching.” ...

Cold Aisle Trenches: You Don't Chase Lights-Out

It was 2017. We had just deployed an additional ScaleIO cluster to handle the onboarding of a new customer with hundreds of VMs. Eight nodes, each with 40 Gbps at the backend. Beautiful. Efficient. The whole rack was a work of art—Dell R740s with MD1220 expansions, bezels removed so you could see all those drives blinking in perfect synchronization. The cluster was deployed less than two weeks ago. I told the customer to “burn it.” ...