The Lone Wolf Starves First

A few months ago I read Project Hail Mary and found myself thinking about observation and agency. Einstein didn’t “invent” spacetime dilation—he created the conditions to perceive it. Without the means to observe, you’re just touching walls in complete darkness. Trial and error, yes, but you never truly know the depth of what you’re sensing. Saturday mornings I take my son to flag football. He’s been in martial arts for half his life—his coach loves his resilience. But something surfaced in team sports that doesn’t appear on the mat. ...

It Took a Pandemic to Learn Why Standards Failed

In 2015, I did what seemed like the mature thing to do. I created a Production Engineering department. My college foundation was production engineering. I was a true believer: if we formalized standards and assigned a dedicated group to own operational rigor, the organization would naturally converge toward consistency. The mandate: Create SOPs. Define standards. Reduce variance. Improve reliability. On paper, it was textbook. In practice, it was a slow-motion collision with reality. ...

When Lack of Guardrails Hurt the Business

Every company says security is a core value. Few embed it as a design constraint. The difference shows up when things break. I get a call from a co-founder I’ve known for years. His company just raised $400M+ Series D. His voice is flat: “We have a problem.” Same day, we’re on a call. He’s a skilled engineer — personally devastated. They leaked over 2 million user records. Home addresses. Phone numbers. The full profile. The data had been publicly accessible for three weeks before anyone noticed. ...

When the Constraint Isn’t Capacity

A few years ago, as Field CTO for an enterprise customer, I was pulled into a rescue effort that started the way these stories usually start: pain, urgency, and a narrative that felt convenient. The application hit a bootstorm—150,000+ users slamming it in a short window—and then the predictable second-order effect: every day after that, more tickets piled up. Instability. Session timeouts. Intermittent failures. The kind of symptoms that turn a service into a rumor. ...

Security Assurance - URE Case - 1/5 - The Inception

1/5 — The Inception Series: Security Assurance — URE Case — 1/5 Start from the beginning: you’re here. Next: 2/5 — Trust Boundaries This is the first of five short posts on Security Assurance Engineering. The goal is simple: separate security intent from security proof, and show what “assurance” looks like when you treat a system as real—owned, changing, and measurable. I’ll use URE as the working surface. URE is the platform where I publish research notes and operating practice generated in my lab—work that started as a few shared threads with friends and peers, and eventually became worth “productizing” into something durable and navigable. ...

Security Assurance - URE Case - 2/5 - Trust Boundaries

2/5 — Trust Boundaries Series: Security Assurance — URE Case — 2/5 Start from the beginning: 1/5 — The Inception Next: 3/5 — The Design In mature environments, we don’t start with implementation. We start with boundaries and ownership. Before anyone spins up “a simple website/blog,” we make three things explicit: What is the system? (scope and components) Who can change it? (identities and permissions) What must always remain true? (invariants + guardrails) Security should be intentional. The goal is to create guardrails the rest of the team can rely on—so delivery is fast and the system stays trustworthy under change. ...

Security Assurance - URE Case - 3/5 - The Design

3/5 — The Design Series: Security Assurance — URE Case — 3/5 Start from the beginning: 1/5 — The Inception Next: 4/5 — Security as an Enabler (and “forward agency”) Design is where “a simple website” becomes a real system. Not because the pages are complex—but because the moment you publish, you inherit real dependencies: DNS, build pipelines, third parties, telemetry, and the drift that comes with change. So before we build anything, we do one unglamorous thing: ...

Security Assurance - URE Case - 4/5 - Enabler

4/5 — Security as an Enabler (and “forward agency”) Series: Security Assurance — URE Case — 4/5 Start from the beginning: 1/5 — The Inception Next: 5/5 — Conclusion — Assurance Without Theater Security enables the business when it shows up with agency: not just identifying risk, but carrying enough context to propose solutions that preserve the mission. That requires a maturity shift. When security arrives late, it often speaks in “non-English.” It blocks because the system is already committed to choices no one can defend. ...

Security Assurance - URE Case - 5/5 - Conclusion

5/5 — Conclusion — Assurance Without Theater Series: Security Assurance — URE Case — 5/5 Start from the beginning: 1/5 — The Inception Security Assurance Engineering is not a side quest. It’s not a compliance ritual. And it’s not a “security team thing.” It’s what turns security from intent into proof—in systems that are owned, changing, and measurable. Across these chapters, the arc is consistent: Part 1/5 (Inception): Architecture sets the invariants. Assurance proves they still hold under change. Part 2/5 (Trust Boundaries): If the boundary isn’t explicit, you don’t have a system—you have assumptions. Part 3/5 (Design): The tedious questions aren’t bureaucracy; they are how you prevent accidental scope and irreversible drift. Part 4/5 (Security as Enabler): Done well, security doesn’t slow delivery—it restores optionality and keeps the mission intact under real pressure. The takeaway is simple: ...

Business Resiliency Through Security Assurance

Every company says security is a priority. Every company also ships under pressure. The gap between those two statements is where businesses bleed. I’ve watched organizations with excellent engineers and serious budgets still get humbled by the same pattern: teams optimize locally (features, velocity, “my backlog”), while the system pays globally (incidents, outages, churn, reputational drag). When things go south, it rarely takes a cinematic attacker or a once-in-a-decade failure. ...

Why GPU Fleet Control Starts with a Map

I’m currently working on the design of a framework for GPU fleet management. We’re living in a crowded data center reality where everybody wants “hero” compute — dense GPUs, fast networking, and delivery that’s closer to the edge. We’re in a land-grab phase where every business wants to be everywhere, but most teams are discovering the same thing: buying GPUs is the easy part. Operating them as a coherent fleet is the hard part. ...

Tail Latency Killed My Beowulf Cluster in 2006

Right now, I’m working on an InfiniBand topology design for a GPU cluster. The math keeps pointing to the same conclusion: scale-out only makes sense when scale-in has topped out. It’s not about CUDA cores. It’s not about tensor throughput. It’s about tail latency. NVLink keeps GPU-to-GPU communication on-package or over short copper links — no NIC, no PCIe host traversal, no protocol stack. For small messages, that means sub-microsecond latency in the hundreds-of-nanoseconds range. InfiniBand NDR switches add sub-microsecond port-to-port latency, but once you include the full path — PCIe to the NIC, driver overhead, fabric hops, and back — real-world GPU-to-GPU latency across nodes often lands in the 3-10μs range depending on message size and topology. ...