When Probabilistic Safety Fails: Lessons from the January Waymo Crash

In January 2026, a Waymo vehicle operating in Santa Monica was involved in a collision with a child and is now under investigation by the NTSB. While the full findings are still pending, the event highlights a deeper issue that very few have been raising for years: the limits of probabilistic safety claims.

For a long time, companies like Waymo have emphasized metrics such as “miles driven per incident” to demonstrate safety. On the surface, these numbers appear compelling. But they rely on a critical assumption: the system, its environment, and its operational constraints remain relatively stable over time. Once that assumption breaks, so does the validity of the claim.

This is exactly what we are beginning to see.

Over the past four months, observable performance trends have raised serious concerns about Waymo’s operations:

Driving through a police standoff and active crime scene in Los Angeles November 28, 2025
Waymo drives across 6th Street, blocks traffic, driving like an unlicensed driving student December 2, 2025 (shown on this page)
Waymo reports software recall for driving past school busses illegally December 8, 2025
Waymo struck a 9-year old pedestrian near a school zone January 23, 2026
Waymo blocks ambulance after mass shooting March 1, 2026
Waymo passes stopped school bus again with flashing sign March 3, 2026
Waymo stops in train crossing as train narrowly misses March 10, 2026

Multiple Waymo AVs struggle to merge back onto 6th Street. Local service industry workers reported they had no issues prior months.

The March 1st blocked ambulance instance has prompted five Austin City Council members to invite Waymo to the “April 29 city committee meeting to explain the failure and discuss how to improve coordination with public safety agencies.”

These incidents are not just outliers; they are signals that the underlying system has changed in ways that probabilistic models cannot adequately capture. We call this underlying foundation “systematic safety” in functional safety. Functional safety relies on both “systematic” and “probabilistic” measures working together.

When companies begin to introduce structural changes through software updates, operational scaling, or cost-cutting measures, the historical safety record becomes irrelevant due to the changed conditions. Even though the historical safety record is not a relevant measure, it has the potential to make things worse, as it becomes an enabler of otherwise dangerous and unsafe decisions. “We haven’t had any issues so far.” “We’re safer than a human driver.” “The risk is shown to be extremely small.”

This isn’t a new concern. Four years ago, in an interview I gave, I warned that cost pressures would eventually force tradeoffs for those companies, like Waymo and many others, that have invested heavily in a probabilistic argument for safety that depends on millions of accumulated miles.

As companies push toward profitability in later stages, there is an inherent risk that suddenly and quietly, a metaphorical “load-bearing wall” may be inadvertently removed that then allows riskier behaviors and exposures to creep in, undermining the historical safety performance.

This may not be readily apparent through any single decision, but a series of optimizations can, collectively, erode the built-in resilience.

Building safety back in, once discovered, is nearly impossible.

To be fair, Waymo has more recently pushed for systematic claims in their Safety Case (Favarò, et al. 2026) that “details a systematic risk assessment process” to identify behavioral hazards among several classes of hazards. These behaviors are further divided into “Conflict Avoidance” and “Collision Avoidance” competencies under “Drivership.” Waymo’s legal team has also been advancing the development of “behavioral competencies” outlined in the global standards group UN GVRA (GVRA 2024). And, in December 2024, Waymo chose to be independently assessed by TÜV SÜD regarding their First Responder Interaction compliance - an activity that will surely be beneficial to them in their presentation to the City of Austin next month. Not many companies have that safety defense built-in, yet.

Undoubtedly, Waymo will be one of the leaders defining safe or reasonable driving as a systematic measure with more updates later to come this year through groups like SAE’s AVSC. We hope these efforts prove fruitful in time to correct the rash of incidents over the past few months.

What we are seeing now suggests a disconnect between stated safety goals and real-world outcomes. Waymo has publicly committed to transparency and road safety, but commitment alone is not enough. Safety must be continuously demonstrated, especially as systems evolve. This requires upfront assessment before, during, and after development changes. And it starts with defining “safe” that is: “reasonable” driving behavior.

Please follow our blog and our LinkedIn page as our next installment will focus on how “reasonable” the Waymo AV was in the incident where the 9-year old pedestrian was struck. Waymo claims a human driver would have also struck the child, and at a much higher speed. The key point in presenting this next study is that almost all the probabilistic claims go out the window. What matters is the behavior.

If anything, this moment reinforces a key principle: safety is not a static metric. It is a dynamic property of a system, shaped by design decisions, operational pressures, and organizational priorities.

When those priorities shift—particularly toward cost efficiency—probabilistic safety claims alone cannot carry the weight. The solution is a systematic definition of safe behaviors that can be demonstrated and defended.

Michael WoonMarch 19, 2026Comment