Maria Pennacchi Schotten's Rubik's Cube wolf mosaic, more than a thousand cubes

Applied AI Is Human Augmentation, Not Replacement

Since 2023, I’ve been studying applied AI almost exclusively. I don’t pretend to be a data scientist or ML engineer. Honestly, I don’t think giving up more than twenty years of infrastructure, performance, and security engineering would be smart. I’d end up like a duck: swims, flies, and walks, but doesn’t outperform at any of them. It’s impossible not to get caught up in the vibe-coding thing. I’m not here to criticize anyone shipping and prototyping. A few months back, I heard one of the smartest things anyone’s said about AI, from Naval Ravikant. I’ve been listening to him for a few years now, and his takes are consistently good. I don’t remember the exact words, and I’m not going to chase videos or quotes to nail them down, but it was close to this: “There is no disruption caused by AI. The novelty we’re seeing is the abstraction and conversion of human language into computing language.” Brilliant. ...

GPU Fleet AIOps: The Augmented Operator

Two in the morning, eighteen hours into the run. Seven LLM backends processing the same stream of GPU cluster anomalies. Same thermal cascades, same NVLink errors, same KV cache evictions. I’m watching the scoring dashboard update in real time and the numbers are breaking my assumptions faster than I can take notes. The $32-per-day model is getting the diagnosis wrong more often than a free one running on my workstation. ...

292x: Why Batch Inference Breaks on API Pricing

292x. That’s not a rounding error. That’s the cost multiplier between running a batch inference job on a rented B200 GPU and sending the same workload through Claude Opus 4.6’s API. The job was straightforward: generate one or two contextual sentences for each of a million documents, extracted JSON from the corporate PDF archive I’ve been building a RAG pipeline around. Those sentences get prepended to each chunk before embedding into Qdrant’s 768-dimensional vectors with BM25 sparse indexing. It’s the contextual layer that makes retrieval actually work, the step I described in the previous article about why a million PDFs won’t organize themselves. ...

Local LLM Bench: Scaling Swarms Beyond Four

Part 2 ended with a promise: find the cliff. Run the MoE model from four concurrent agents upward until the physics says stop. We scaled to eight. The cliff never came. This is Part 3 of the Local LLM Bench series. Part 1 covers the single-request baseline. Part 2 established the MoE advantage under concurrent load. The model: Qwen3-Coder-30B-A3B — a Mixture-of-Experts architecture that activates only 3.3B of its 30B parameters per token. On consumer GPUs, that sparse activation leaves ~90% of memory bandwidth idle at batch size 1, creating headroom that concurrent agents fill. Dense models activate all 32B parameters on every token — already at the bandwidth ceiling before the second agent connects. Part 1 explains why these specific models were chosen (best in class for each architecture); Part 2 conclusively eliminated Dense under concurrent load. This benchmark tests MoE only. ...

Local LLM Bench: Best Model for Coding Swarms

In Part 1, we established the baseline: MoE delivers 168 tok/s on a single RTX 3090, 4.1x faster than Dense. Clean single-request numbers. One prompt in, one response out. That’s not how swarms work. An orchestrator like Claude Code dispatches four coding tasks simultaneously. The local model serves all four. Under concurrency, memory bandwidth saturates, per-task throughput drops, and the architecture of the model — not the GPU, the model — determines whether you get useful parallelism or just contention. ...

The Heat Nobody Counts - PUE Ends at the Meter

Meta’s Prometheus data center in New Albany, Ohio is scaling to 1.2 GW. To get there, they’re building behind-the-meter natural gas turbines — two 200 MW Socrates generation facilities, supplied by dedicated gas pipelines, isolated from the grid. In Virginia, the same story plays out with diesel generators, enough of them that it became the top legislative concern entering the 2026 session. The industry talks about PUE as if it were a verdict on environmental efficiency. It isn’t. PUE measures one envelope — the data center facility. Total facility power divided by IT equipment power. A PUE of 1.3 means 30% overhead for cooling, lighting, and support systems. That’s the metric everyone optimizes, the number that shows up in sustainability reports, the figure that earns applause at conferences. ...

Local LLM Bench: MoE vs Dense on One RTX 3090

I went looking for sustained-load benchmarks comparing MoE and Dense coding models on consumer GPUs. Not demo bursts on a Mac Mini — sustained autoregressive generation on real coding tasks, where architecture and interconnect are the only variables. I found plenty of one-shot numbers. Nobody had published the comparison that matters: same hardware, same quantization, same inference engine, MoE versus Dense, across GPU configurations. Methodology visible. Numbers verifiable. So I ran the tests. Dual RTX 3090s with NVLink, custom liquid cooling, a 6 kW isolation transformer feeding a double-conversion UPS. Not elegant, but thermally and electrically honest — sustained inference loads without throttling, no measurement fiction. The hardware details are below. ...

The Concorde Problem in AI Infrastructure

The Concorde burned one ton of fuel per passenger to cross the Atlantic. One hundred seats. Three and a half hours. Mach 2. The most advanced commercial aircraft ever built — and every engineer who saw it wanted to believe it was the future. The 747 did the same crossing in seven hours. Four hundred seats. A quarter of the fuel per passenger. No afterburners. No sonic boom. No government subsidies keeping it alive. ...

Building Trust in Security: Part 3

This is the third and final part of a series based on a real-world engagement: a company that scaled from $40M to $1B in annual revenue in just five years, and the security program that had to grow with it. This is a story about building high-performance operating systems where security, standards, architecture, and performance act as enablers rather than constraints. Part 1: Earning credibility before you’ve earned authority. Part 2: Blurring the lines — Security at the SRE and Operations level. Part 3: Wrapping the gift — Transparency and agency. The Quality That Can’t Be Purchased I’ve been writing around this idea for a while — in Cold Aisle Trenches, in why standards fail when you try to impose them, in how defense in depth actually works at scale. The thread is always the same: security can’t be bought. You can’t swipe a credit card and receive “secure” in a box. It’s a quality that emerges — like the lights-out data center you don’t chase but eventually arrive at, because every other piece fell into place first. ...

Building Trust in Security: Part 2

This is the second of a three-part series based on a real-world engagement: a company that scaled from $40M to $1B in annual revenue in just five years, and the security program that had to grow with it. This is a story about building high-performance operating systems where security, standards, architecture, and performance act as enablers rather than constraints. Part 1: Earning credibility before you’ve earned authority. Part 2: Blurring the lines - Security at the SRE and Operations level. Part 3: Wrapping the gift — Transparency and agency. From Trust to Reliance ...

Building Trust in Security: Part 1

This is the first of a three-part series based on a real-world engagement: a company that scaled from $40M to $1B in annual revenue in just five years, and the security program that had to grow with it. This is a story about building high-performance operating systems where security, standards, architecture, and performance act as enablers rather than constraints. Part 1: Earning credibility before you’ve earned authority. Part 2: Blurring the lines - Security at the SRE and Operations level. Part 3: Wrapping the gift — Transparency and agency. The Inflection Point A few years back, AMTI was at the heart of a fascinating corporate challenge. I was serving as a fractional CISO and advisor for a company standing at a critical inflection point. ...

The Entropy of Sovereign AI: Map vs. Territory

A few years ago, I was having dinner with the Americas VP of a European energy supermajor — one of those companies that extracts oil from war zones, negotiates with regimes that don’t appear on polite lists, and operates in places where “political risk” means your assets might get nationalized or your personnel kidnapped. Seventy-plus countries. Active operations in Libya, Nigeria, Angola, Myanmar, Yemen. The kinds of places where security briefings come before breakfast. ...