A single faulty channel file took down 8.5 million Windows machines, grounded flights and froze trading desks. Under DORA, the question is no longer 'whose bug?' but 'where was your simulation?'
On 19 July 2024, a faulty content update in a widely deployed endpoint security agent rendered roughly 8.5 million Windows systems unbootable within hours. Airlines grounded fleets, hospitals reverted to paper, and financial institutions discovered that a single third-party file could do what no attacker had managed. There was no adversary, no malware, no breach. That is precisely what makes it the cleanest resilience case study of the decade.
The incident
One vendor, one channel file, one global push. The update reached every machine running the agent at effectively the same moment, and the failure mode was total: blue screen, boot loop, hands-on-keyboard recovery per device. Scale turned a software defect into infrastructure weather.
The regulatory reading
The EU Digital Operational Resilience Act draws no comfort from the absence of an attacker. Its core demand of financial entities is that ICT disruption, malicious or not, be anticipated, withstood and recovered from. Three of its pillars are squarely engaged: ICT third-party risk (the failing component sat deep in nearly every institution's supply chain, often below contract-level visibility), digital resilience testing (scenario testing must cover severe but plausible disruptions; a simultaneous endpoint-agent failure was plausible, severe and almost nowhere simulated), and incident reporting (institutions had hours, not days, to understand and classify their own exposure).
What computation would have changed
The dependency was knowable: a complete asset inventory shows the same agent on every endpoint, a single point of correlated failure. The blast radius was computable: a Monte-Carlo simulation over that inventory, the kind DORA-MAST runs for financial entities and cVaR runs for any industry, prices the scenario "trusted agent fails everywhere at once" in financial-loss terms, turning a vague worry into a board-grade number. And recovery is faster with evidence at hand: institutions that knew exactly which machines ran which agent version recovered in hours; those reconstructing their estate from spreadsheets took days. Automated evidence collection keeps that answer current.
Predict, simulate, remediate in advance: none of it required clairvoyance. It required an inventory, a model and the will to compute the unhappy path before living it.