“How many miles did you say?”

The autonomous driving community has spent many months now reviewing and discussing the so-called “Rand Report”, which summarizes the mathematical underpinnings of mileage-based validation for autonomous vehicle safety (see reference [1]).

Rand’s report seeks to answer the basic question, “How many miles would autonomous vehicles have to be driven without failure to demonstrate that their failure rate is below some benchmark?” That’s the question on the minds of autonomous engineers everywhere.  Rand’s authors did a nice job of parsing out the numbers.  As you would expect, the science is sound and the mathematics are rigorous.

But even within this important analysis, much depends on your point of view.  For example, the possible outcomes in Rand’s analysis range from as low as roughly 100,000 miles all the way up to around 500 million miles, depending on our assumptions.  Where within that range should we set the target for autonomous vehicle validation?  That depends on what assumptions we make for values such as confidence level, the correct human accident-rate benchmark, and other factors.  These are judgment calls. While the Rand Report gives us a tantalizing insight into how to make this calculation, it doesn’t spend much time making or justifying these judgment calls, there is still a basis for making such judgements.  It’s worth developing a few of these assumptions, for parameters such as:

    • Confidence Level – While the Rand Report provides examples using an assumed 95% confidence level, other engineering sources in this space use lower values. For example, the ISO 26262 standards’ “Proven in Use” clause sets a target of 70% confidence level to perform a roughly similar calculation [see reference [2]).  As another example, most suppliers of automotive components report failure rates using confidence levels of 60% and/or 90%. It’s likely these lower levels would be used to develop final targets, rather than the 95% or higher confidence levels cited in the Rand Report.
    • Human-Behavior Target : Accident vs Fatality – The Rand Report performs example calculations assuming we will prove fatalities are lower in autonomous vehicles than in human-driven vehicles. But this comparison is impossible in practice.  In reality, accident data from real cars will be compared against human driver intervention data (that is, the rate at which a supervisory safety driver intervenes to prevent an upcoming accident during field test & validation).  There is no fatality in these safety driver takeovers.  Indeed, it’s next to impossible to estimate whether a fatality would have occurred if the takeover was not performed.  Therefore, it’s not reasonable to compare rates of actual human-driver fatalities against rates of takeover-events in a monitored autonomous vehicle.  A more reasonable (yet still imperfect) comparison might be autonomous-vehicle interventions against human-driver reported accidents.
    • Human Behavior Target: All Accidents vs Some Accidents – When setting a target based on human-driver accident rates, it’s not clear that “as good as average humans in all situations” is good enough. After all, some humans have terrible behaviors on the road (for example driving while intoxicated), which pulls the human “average” accident rate up.  Another example is that some vehicles are only targeted for city use, which in turn experience different accident statistics than, for example, a highway truck.  Choosing the ‘right’ target based on human accidents is a difficult analysis that is worthy of future research.  In the end, I’m guessing a reasonable target would be something along the lines of “Better than a decent human driver, operating within the same operational design domain (ODD).”
    • Regression and Change Control – The Rand analysis, like most analyses of this type, assumes a stable system over time. But in fact, the developers of autonomous systems are constantly tinkering, fixing, and updating their autonomous software, even as they accumulate miles to validate it.  Unfortunately, there is no iron-clad way to prove that such code has not “regressed” in some hazardous way, which might lead to more accidents in some situations.  The Rand analysis implicitly assumes either no change in software, or perfect regression tests to verify such changes.  In practice, however, this will not be the case, and the calculation may be impacted accordingly.

[1] Kalra, N. and Paddock, S., “Driving to Safety: How Many Miles of Driving Would it Take to Demonstrate Autonomous Vehicle Reliability?” Rand Corporation Report posted 2016; available at https://www.rand.org/pubs/research_reports/RR1478.html

[2] ISO 26262 – Road Vehicles Functional Safety, 1st Edition, 2011.  See part 8 clause 14, “Proven in use argument.”

Leave a Reply.