<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><title>Platform Automation &amp; Fleet Operations on URE</title><link>https://ure.us/pillars/platform-automation--fleet-operations/</link><description>Recent content in Platform Automation &amp; Fleet Operations on URE</description><generator>Hugo -- 0.161.1</generator><language>en-us</language><lastBuildDate>Tue, 07 Apr 2026 00:00:00 +0000</lastBuildDate><atom:link href="https://ure.us/pillars/platform-automation--fleet-operations/index.xml" rel="self" type="application/rss+xml"/><item><title>GPU Fleet AIOps: 7 LLM Backends, 6 Failure Scenarios</title><link>https://ure.us/articles/gpu-fleet-aiops-llm-backend-benchmark/</link><pubDate>Tue, 07 Apr 2026 00:00:00 +0000</pubDate><guid>https://ure.us/articles/gpu-fleet-aiops-llm-backend-benchmark/</guid><description>Benchmarking seven LLM backends as autonomous operators for an 8,000-GPU cluster with six realistic failure scenarios and deterministic checklist scoring.</description></item><item><title>GPU Fleet AIOps: The Augmented Operator</title><link>https://ure.us/articles/gpu-fleet-aiops-the-augmented-operator/</link><pubDate>Tue, 07 Apr 2026 00:00:00 +0000</pubDate><guid>https://ure.us/articles/gpu-fleet-aiops-the-augmented-operator/</guid><description>Seven LLM backends competed to run an 8,000-GPU cluster. The free local model matched frontier accuracy at one-fifth the latency. The $32 model scored worst.</description></item><item><title>Context Drift Kills AI Agents Before Latency Does</title><link>https://ure.us/articles/context-drift-kills-agents-before-latency/</link><pubDate>Wed, 11 Mar 2026 00:00:00 +0000</pubDate><guid>https://ure.us/articles/context-drift-kills-agents-before-latency/</guid><description>LLM agents on remote hosts drown in unfiltered SSH output. Context drift -- not latency, not cost -- is what kills autonomous fleet operations at scale.</description></item><item><title>Cold Aisle Trenches: You Don't Chase Lights-Out</title><link>https://ure.us/articles/cold-aisle-trenches-you-dont-chase-lights-out-you-earn-it/</link><pubDate>Thu, 29 Jan 2026 00:00:00 +0000</pubDate><guid>https://ure.us/articles/cold-aisle-trenches-you-dont-chase-lights-out-you-earn-it/</guid><description>A real outage story showing why lights-out operations require guardrails: ticketed access, intent-based authorization, OOB management, and safe rebuild limits.</description></item><item><title>From Security to Resilience: Defense in Depth</title><link>https://ure.us/articles/from-security-to-resilience-defense-in-depth/</link><pubDate>Thu, 22 Jan 2026 00:00:00 +0000</pubDate><guid>https://ure.us/articles/from-security-to-resilience-defense-in-depth/</guid><description>Multi-tenant cloud security is resilience: detect, contain, and recover faster than adversaries can escalate, without violating tenant privacy.</description></item><item><title>Why GPU Fleet Control Starts with a Map</title><link>https://ure.us/articles/why-gpu-fleet-control-starts-with-a-map/</link><pubDate>Wed, 07 Jan 2026 00:00:00 +0000</pubDate><guid>https://ure.us/articles/why-gpu-fleet-control-starts-with-a-map/</guid><description>GPU operations starts with footprint truth: a living map of where compute really is, across sites, standards, and drift.</description></item><item><title>Telemetry That Lies: GPU Thermal Monitoring</title><link>https://ure.us/articles/telemetry-that-lies-gpu-thermal-monitoring/</link><pubDate>Sat, 27 Dec 2025 00:00:00 +0000</pubDate><guid>https://ure.us/articles/telemetry-that-lies-gpu-thermal-monitoring/</guid><description>Your GPUs report 100% utilization while running slower. Temperatures look fine while racks drift hot. Thermal telemetry is easy to collect and hard to trust.</description></item></channel></rss>