First defined in Outpaced by AI by Waydell D. Carvalho.
Outcome: The system fails faster than the people running it can react.
A complex automated system that works inside its normal bounds is not the same as one that recovers when those bounds are exceeded. The first is engineered for normal. The second is engineered for failure. Cascading Failures is what happens when accumulated state, the old code and legacy settings no one is watching, collides with a new change and propagates faster than anyone can step in.
Knight Capital is the case. The firm pushed a routine software update to its trading servers, but one of eight servers did not get the new code, and an old, dormant function from years earlier woke up on the live market. In about 45 minutes, the system executed millions of unintended trades and lost about $460 million. A firm that had been one of the largest market makers in US equities was insolvent before lunch.
The accumulated state was the trap. A feature written years earlier had been retired but never removed, and a reused configuration flag brought it back to life when the new code went out unevenly. The validation process checked that the deployment was marked complete. It did not check that the code actually running on every server was the code intended. The gap between recorded as done and actually true was where the cascade started.
Once it started, speed did the rest. The failure ran at machine pace across the system's footprint while the humans were still diagnosing it, and even the rollback made things worse, because it removed the new code from the servers that had it and left the dormant fault running on the one that did not. Recovery procedures built for normal operation did not fit the failure that was actually happening. The system outran the people who owned it.
AI systems carry exactly this kind of accumulated state, in the form of old model versions, stale configurations, deprecated features, and training assumptions no one revisits. A new change can interact with any of it in ways the change-validation never anticipated, and the interaction can propagate at the speed the system runs at, which is faster than a human response. The danger is not the new change alone. It is the new change meeting the forgotten thing, and nobody being fast enough to catch it.