Improving Business Resiliency Through Security Assurance

Improving Business Resiliency Through Security Assurance Every company says security is a priority. Every company also ships under pressure. The gap between those two statements is where businesses bleed. I’ve watched organizations with excellent engineers and serious budgets still get humbled by the same pattern: teams optimize locally (features, velocity, “my backlog”), while the system pays globally (incidents, outages, churn, reputational drag). When things go south, it rarely takes a cinematic attacker or a once-in-a-decade failure. ...

January 13, 2026 · Stefano Schotten

Telemetry That Lies: Why GPU Thermal Monitoring Is Harder Than It Looks

The “Everything Is Green” Problem Here’s a realistic scenario I’ve seen in different forms across fleets (this is a composite, not a single true story with exact numbers): A training run is supposed to take ~3–4 weeks. Two weeks in, someone notices the timeline slipping. Not a crash. Not a failure. Just… slow. The job is running 10–30% behind plan, and nobody can point to a smoking gun. The dashboards look perfect: ...

December 27, 2025 · Stefano Schotten

Predictive Power Conditioning for GPU Clusters

GPU clusters don’t fail from sustained load. They fail on transitions. A pod idling at 20 kW can step toward 300 kW quickly when training begins. The peak matters, but the killer is the step: the dP/dt that forces every layer of the electrical path to react at once. Thermals matter too—but they’re secondary and collateral. Power transients can push protection and control behavior in cycles. Thermal consequences show up later as throttling, efficiency loss, and “mysteriously slower training” that looks like a software problem until you instrument the facility. ...

December 18, 2025 · Stefano Schotten