In 2015, I did what seemed like the mature thing to do. I created a Production Engineering department.

My college foundation was production engineering. I was a true believer: if we formalized standards and assigned a dedicated group to own operational rigor, the organization would naturally converge toward consistency.

The mandate: Create SOPs. Define standards. Reduce variance. Improve reliability.

On paper, it was textbook. In practice, it was a slow-motion collision with reality.


The Flaw of “Outside-In” Documentation

Most of the documentation produced during those years had the same fatal flaw: it wasn’t battle-tested.

SOPs were either written too early—before the operation stabilized—or they drifted silently into obsolescence. A firewall upgrade, a UI redesign, or a minor workflow tweak would turn a “Standard” into a lie. Suddenly the SOP was almost right—which is worse than wrong, because it creates a false sense of security.

Engineers stopped trusting the documents. To fix one, they had to open a ticket, wait for a compliance team, and explain the problem to someone who wasn’t on-call.

What was meant to be a quality mechanism became operational friction at the exact moment we needed velocity: during incident recovery.


Two Factions, One Company

We inadvertently created a “Shared Fate” crisis.

Production Engineering felt responsible for standards but lacked real-time exposure. Operations carried the pager but felt constrained by stale rules. One side saw a lack of discipline; the other saw a disconnection from reality.

We didn’t say it out loud, but there were two factions—each convinced the other wasn’t playing on the same team.

That’s when I learned the hard truth: SOPs owned by a separate group are perceived as external control, not shared infrastructure.


The Pandemic Stress Test

For sixteen years, my leadership was built on presence.

Not meetings. Not authority. Physical presence. The ability to sit next to someone, take a look, mediate between teams without scheduling anything. I didn’t realize how much that presence was masking.

When COVID-19 hit, my move wasn’t strategic. Borders closed. Flights vanished. I boarded a plane thinking I’d be back in a few weeks. Locked my room. Told everyone: “See you soon.”

That was the last time I operated the company from the same continent.

The “Safety Net” of my physical presence vanished overnight. The cracks showed instantly.


What Distance Taught Me About Scalability

The days became full of a familiar thought: this meeting could have been an email.

Synchronous presence is the enemy of scale. We didn’t have a resilient system; we had a “Hero Culture” that required me to be in the room.

We had to pivot for survival. We moved from Enforced Standards to Generated Standards.

SOPs as Execution Artifacts: We embedded documentation directly into the workflow—at one-click distance, accessible at decision time.

Decentralized Ownership: We removed the “Compliance Gate.” If a standard failed contact with reality, the engineer who felt the pain had the authority—and the toolset—to rewrite it immediately.

Governance through Guardrails: We stopped using tickets to control quality and started using systems.

The result? 99.99% uptime. Over 2,000 days without a critical failure. We scaled 60x while reducing operational costs by 40-60%.


The “Thunder” of AI Infrastructure

This is why I started the URE project. We are seeing the 2015-era “SOP Failure” repeat itself in AI Cloud billing.

Most companies give engineers an “unlimited credit card” (the Cloud API) to ship fast. The engineers care about the API; they don’t see the bill. Then, six months later, the “Thunder” hits. Leadership sees a massive dent in the net result and triggers a “Hero Mode” cleanup—freezing hires and killing projects to save $100k/month.

That isn’t heroism. It’s a systemic failure of accountability at the point of creation.

The lesson took me six years and a pandemic to internalize: You cannot separate those who create from those who bear the consequences of creation.

Not with SOPs. Not with FinOps dashboards. Accountability must live at the moment of provisioning. No Owner, No Create.


Where the Seams Cross

What emerged on the other side wasn’t just operational improvement. It was a different kind of organization.

Standards became living artifacts—trusted because they were forged in production by the hands that felt the consequences. Resilience stopped being a metric and became a byproduct of ownership. Security strengthened where the seams crossed between teams, because transparency replaced territorial gatekeeping.

Empowerment and guardrails turned out not to be opposites. Guardrails created the safety for autonomy to exist. Accountability wasn’t enforced from above—it emerged from within.

The result was decentralized leadership. Not by design. By infrastructure.

And something else happened that we didn’t expect: people stayed. Average employee tenure rose to over six years. When engineers own the standards they operate under, when they have the authority to improve what’s broken, when accountability is shared rather than assigned—they don’t just perform better.

They belong.