Stefan Seltz-Axmacher's General Blog

Simpler is Safer: Occam's Safety Razor

This is the 4th in a series of articles I’m calling ‘Opening the Starsky Kimono,’ where I reveal the Starsky insights we previously considered top secret. You can read about Starsky’s approach to safety here, business model here, and thoughts on AI here.

When it comes to autonomous safety, simpler is almost always better.

That might seem strange. The consequences of an autonomous car breaking are huge - death, injury, damage, and the likelihood that the entire company will go under as a result. You might assume, then, that in order to make such a system safe you need to use every tool at your disposal and then chip away at those that don’t seem necessary. Like how a sculptor might start with a slab of marble which they chisel down into a masterpiece.

A prototype Zoox vehicle that is covered in sensors and omni-directional

While incredibly sophisticated, this early Zoox prototype would be magnitudinally harder to prove safe than a few sensors and drive by wire on a pre-existing car. SCREENSHOT FROM BLOOMBERG (YOUTUBE)

The more parts there are in a given system, the more possible combinations of slight failures that might result in a big one. Too many sensors pointed at the vehicle in front of you and the autonomous agent might jump between which one to follow, creating strange and potentially unsafe driving behavior. You could solve this by only following one or two of those sensors, which would negate the value of those sensors for that task while maintaining their ability to make other subsystems act strangely.

The more complex a system, the harder it is to understand. The less understandable, the fewer the people who will be able to raise valid safety concerns. That can split an engineering organization between those who understand safety and those who understand the system - meaning that many of the most critical safety concerns are never raised. To empower everyone to make a system safe you need one that everyone can understand.

Flying shuttle for Jacquard loom c 1800s

While a flying shuttle is faster than a needle and thread, it was also more likely to cause weavers to lose eyes when they were deflected and shot clear of the machine. SSPL VIA GETTY IMAGES

I think of it as Occam’s Safety Razor, from the principle that the simplest solution is most likely the right one. The simpler a system is, the easier it is to make safe; the more complicated a system the harder it is to make safe.

To be clear, however, that isn’t to say that simple systems are always the safest, just that they’re the easiest to make safe. A taut belt around a steering wheel and a brick on the accelerator do not make for a safe robotaxi - even if you would save a lot of money on roboticists.

That isn’t to say that a belt & brick unmanned car can’t be safe. If you were on a closed track and pointed it at a concrete wall 1km in the opposite direction from you and your team you could be reasonably sure that it wouldn’t hurt anyone.

Your system could also be safe if, instead, you hacked the car’s drive-by-wire system and ordered it to go straight at 45mph towards that wall. To make it safe, though, you’d have to do a lot more work. Does your command to drive straight actually work, or is it offset by 15° and going to come back around at you? Is any of the code telling the transmission to switch into reverse? Those are all things you can check, but doing so is far harder than the safety checks for the Belt & Brick system.

All of this is counterintuitive - we’re taught to correlate complexity with sophistication, Autonomy is hard so a better and safer system must be more complex. That just isn’t the way engineering works, the safest bridge isn’t a futuristic suspension bridge but a causeway.

And when it comes to autonomous vehicles the safest are those that are the most understandable.

This post was originally published in Forbes

Throwing Wrenches At Autonomous Vehicles

If you need to quickly figure out whether or not an AV is safe, you might want to throw a wrench at it. If it doesn’t crash, it’s worthy of further review. If it does, case closed.

Of course, you don’t necessarily throw a literal wrench at the AV, you just a proverbial wrench in it’s plans - add some forced entropy to an otherwise uneventful test drive. Demand a last minute turn (if it’s reasonable that users might do so), push a shopping cart into it’s lane (if it might ever encounter them), pour black paint over some lane lines.

DiDi Autonomous Driving Robotaxi In Shanghai

VCG VIA GETTY IMAGES

Few industries receive as much speculative scrutiny as autonomous vehicles (AVs). The infamous trolley problem is case in point - a google of “trolley problem” + “autonomous” yields 4000 search results (probably 40x the total number of vehicles that have ever driven unmanned on public roads). When it comes to regulating autonomous vehicles there are intensive conversations about which sensors should be required, which technologies banned, whether or not photorealistic simulation is required, et cetera et cetera.

Out of all of these conversations the consistent outcome tends to be something akin to a complex driving test. A closed driving course with pre-set obstacles designed to pop out at predetermined times. Since these tests would be much more thorough than what we’d give a student driver, the thinking goes, passing these tests will help prove an AV is safer.

The problem is that AVs are really really good at cramming for tests.

Almost every AV uses a fair amount of machine learning (ML), which works by looking at a set of relevant data and then determining whether or not current conditions look like that data. These lane lines look like those it was trained on, so it recognizes them, etc. From a testing perspective a problem emerges - what if you train on a model just on different attempts at doing that test?

You could imagine an essay writing AI, trained on 100,000 essays on Romeo & Juliet (as well as the original text and grade scores). Such an AI would be incredibly good at writing an essay on Romeo & Juliet, but lackluster at a question on Hamlet and terrible on a Ulysses question. For the AV - you could have a system that got top marks on the track but was utterly unable to drive anywhere else in the world.

The solution is actually something that humans do naturally and without advanced technical degrees: adding forced entropy, or throwing the odd wrench in the pre-scheduled test. What regulators really care about is that these systems can be deployed fairly safely, and a relatively small amount of entropy can break all but the most robust AI systems.

Asking for a last minute left turn during a ridealong, for example, is the easiest way to get a sense of whether the AV can move in the free world or is actually following an invisible track. Pushing a shopping cart into a path (when you haven’t previously told the AV developer) is a great way to tell whether the AV system can deal with weird obstacles.

What’s useful about this approach is that it tests the AV system being deployed. Depending on use case and proposed stage of deployment, that system might include a teleoperator and/or a safety driver; and a successful test is one where the AV doesn’t cause an unavoidable accident.

High res simulation, 3rd party review of design docs, reading VSSAs - all of those steps are still useful in evaluating the safety of an AV, but are only worth the long process of evaluating after an AV passes the wrench test.

And when it comes to unmanned deployment, my guess is that most AV companies worth over $50m wouldn’t be able to dodge a wrench.

This was originally posted in Forbes.

Making Teleop Safe

An autonomous truck navigating a windy hilly road

A Starsky autonomous truck driving in the hills of the San Francisco Peninsula STARSKY ROBOTICS

(This is the fifth in a series of articles I’m calling ‘Opening the Starsky Kimono,’ where I reveal the Starsky insights we previously considered top secret. You can read about Starsky’s approach to safety here and here, business model here, and thoughts on AI here.)

A truck glides down the interstate. Drivers speeding by look up and are shocked – there’s no driver in that autonomous truck! Up ahead a deer jumps into the truck’s lane and hundreds of miles away a teleoperator is asked to take control of the vehicle. But they aren’t able to in time – either the deer jumped too quickly or the teleoperator wasn’t able to get situationally aware or worse yet: the cellular connectivity isn’t good enough!

Such was the situation painted to me time after time after time as CEO of Starsky Robotics, whose remote-assisted autonomous trucks were supposed to face exactly such a scenario. And yet, it was an entirely false scenario.

As I’ve written about before, safety doesn’t mean that everything always works perfectly, in fact it’s quite the opposite. To make a system safe is to intimately understand where, when, and how it will break and making sure that those failures are acceptable.

Just as you can constantly test whether or not your radar is working, you can do the same with a cellular connection. You can whitelist the routes that have sufficient connectivity, so that if you’re ever in a dead zone you know something must be going wrong and and that pulling over is appropriate. Finally you can design the entire teleop product around the limitations of human situational awareness - at Starsky we figured that it would take 10 seconds to get situationally aware of a truck and that any emergency within that window needed to be solvable by the truck itself.

Much of the condemnation of teleop safety stems from the availability heuristic - the notion that if something can be recalled it must be important or at least more important than things that can’t be as easily recalled. Almost everyone has experienced poor cellular connectivity first hand, and it’s always particularly frustrating. On the other hand, few have been nearly as annoyed at the failures of radars, LIDARs, GPUs, or high grade cameras.

A man is frustrated while looking at his cellphone

Everyone has experienced bad cell reception, which is part of why it's so easy to dismiss technologies that rely upon it, like teleop, as unreliable. GETTY

Of course the connectivity of a teleoperated vehicle will have problems, just as LIDARs will fail and machine learning models will spit out bad results. Safety critical systems are never perfect, and to that end teleop is unexceptional - you also need to build a safety case around it.

That safety case is far less difficult than most expect. The two frequently failing components of a teleop system are the connectivity and the remote driver. For the former you can easily write specs on the required level of connectivity, tests to verify them, and then the logic that if the system starts failing tests it pulls over. You can make all of that a lot easier if you’re able to control where your system operates, so as to give you a fighting chance of good connectivity.

A truck driver sits behind a series of screens and remotely drives a truck

In certain settings, like an access restricted yard like the one above, even direct drive teleop can be safe. STARSKY ROBOTICS

When planning around teleoperators the primary issue comes from product design. As my Starsky co-founder pointed out here, latency becomes a larger issue for direct teleoperation at higher speeds (because with constant latency more distance passes between a command being ordered and carried out). At Starsky we solved this by designing what we called “Supervised Autonomy,” where a remote driver would choose from a finite list of discrete behaviors (like left lane change, slow down 5mph, etc). Not only was this approach far less latency intensive, it also meant that every time our remote driver ordered a command we would create low dimensional decision making data that we could eventually use to train a higher-level AI (if there was ROI).

I was once having a candid discussion with a peer who told me that there were two types of autonomy companies: those who weren’t serious about unmanned and those who use teleop.

As the trough of AI disillusion continues to deepen, I reckon there will be more and more who realize that not only is teleop useful for taking the person out of the vehicle, it’s safe.

This was originally published in Forbes

Safety Engineering in The Time of COVID

Test Drive Robot Car Cruise

(This is the third in a series of articles I’m calling ‘Opening the Starsky Kimono,’ where I reveal Starsky Robotics’ key insights we previously kept top-secret. The first covers the end of Starsky and limitation of AI and can be found here. The second (here) covers the business use case for AV trucks and the commercial irrelevance of true AI to that aim.)

A few months ago I wrote that people part of the challenge of deploying an autonomous truck was that people didn’t really value, or understand statistical safety.

How things have changed.

In the last two months everyone has developed a qualified opinion on statistical safety. At least, that is, when it comes to public health in response to Covid-19. VCs who told me they thought the risk of unmanned trucks was too great are now tweeting that we should accept a higher death rate so as to re-open the American economy.

The statistical arguments that underpin proposed responses to Covid-19 aren’t that different from the models we used at Starsky to perform our public unmanned test. The entire Covid-19 crisis, in fact, presents a surprising parallel to explain what safety engineering really is.

Safety is not the absence of risk, but the absence of unacceptable risk. Just as every public health policy will lead to some number of fatal Covid-19 cases, any deployed AV will have a greater-than-zero fatality rate. Making that system safe is a matter of understanding how, why, and when it will hurt people and ensuring that those reasons are acceptable.

People sit in pre-drawn circles in a park in Toronto

These social distance circles at a park mitigate the risk of a full-on reopening. Much like having backup drivers ready and able to jump into an emergency-stopped unmanned truck. TORONTO STAR VIA GETTY IMAGES

It is unacceptable to deploy a system that regularly hurts people while it’s working as expected in normal driving conditions. On the other hand, it can be acceptable to deploy one that might hurt people while failing in rare ways in uncommon driving situations. As long as you know the exact risks you’re taking.

Think of it this way - if you walk through a Covid-19 ward you won’t necessarily die. To die you’d first need to be in contact with the virus, catch it, have a particularly bad case, and ultimately succumb to the illness. If you need to walk through that ward, you can mitigate those circumstances by taking precautions while walking (6’ apart, masks, hand-washing), responding quickly to potential exposure (testing and going on early treatments), and quickly going on a full course of treatment. While the chance of fatality is still greater than zero, it’s significantly lower.

A doctor in multiple pieces of PPE performs a body temperature test

This Mumbai doctor is taking multiple independent steps which each reduce his likelihood of exposure. AFP VIA GETTY IMAGES

For AVs you can also break the problem down. A failed system or freak incident doesn’t necessitate a fatality The freak incident might either happen when the AV isn’t nearby or the system failure might be when the AV isn’t near a person, the AV system’s onboard diagnostics then have the opportunity to catch the failure, and assuming that they do the system then has the opportunity to avoid incident.

Through a decent amount of work you can figure out the statistical likelihood of each of those steps. For some you can look at road safety data, your design team can conduct FMEAs to understand which failures pose the risk of harm and their causes, and you can do an incredible amount of real world testing — I’d estimate we drove on the same 8mi stretch of road 1500-2500 times for our unmanned run.

One of the surprise MVPs of the entire Unmanned program was diagnostics. At Starsky we were able to build a highly modular and measurable system. Each node was only supposed to do very specific and measurable things. The front normal camera, for example, was supposed to spit out an image every so many milliseconds. If it failed for a few milliseconds we would log a failure, and if that failure continued we would go to a minimal risk condition. The lane detection model similarly was supposed to spit out lane lines every few milliseconds and those lane lines should look fairly similar to the previous set (give or take a few radians). If that failed for too long we would pull over. In the two months before our unmanned-run every safety-driver disengagement was predicted by the diagnostic system.

That is to say, that if the safety driver hadn’t been in the vehicle we wouldn’t have crashed. We would have come to a safe stop.

For some branches in the failure tree we didn’t like our odds. For example - even if we successfully avoided the accident there was a measurable likelihood that someone would rear-end our truck while pulled over on the side of the road. We could, however, mitigate that risk by having a safety driver in a follow-car who was rated at being able to get in and start the truck in under 60 seconds to decrease our exposure to that risk meaningfully.

A Starsky unmanned truck on a closed road

in May 2019 Starsky did an unmanned practice run on a closed highway to drill the team in procedure and truck recovery. Knowing that a driver could get a stopped truck moving in 60s mitigated the risk of a sitting duck accident. STARSKY ROBOTICS

Doing an unmanned run is a matter of certainty - we needed to be statistically confident that we wouldn’t need a safety driver for the test that didn’t have one. To stretch the parallel - we needed to be incredibly sure that we knew that our precautions would make us unlikely to catch Covid-19 if we walked through that ward as the first step towards a broader re-opening.

Our simple high level metric was the number of consecutive zero-disengagement runs we had completed. A zero-disengagement run is a run where the safety driver wasn’t needed from the beginning of the test to the very end.

When we did our first zero-disengagement run, back in Aug’17, it was a matter of luck. We had been trying for 3 days nonstop and everything finally worked as planned. That would have been the first time we could have taken the person out of the truck, but we would have truly been rolling the dice. As a metric, consecutive zero-disengagement runs are useful because if you haven’t needed a safety driver for 1000 consecutive tests it’s highly likely that that you won’t on your 1,001th test and could therefore take the safety driver out.

You can then do additional work to lower that number of consecutive tests. By doing an incredible amount of documentation to understand how the system worked and making sure that it was safe as intended, by building rigorous diagnostics which allowed the system to know if it was failing, controlling the conditions we drove in and countless smaller mitigations; we were able to model out that 80 zero-disengagement runs in a row would indicate a 1 in a million chance of fatality accident were we to take the driver out on the 81st test.

On June 11th, 2019, at Starsky Robotics we completed our 80th consecutive zero-disengagement run. On June 15th we completed our 141st. And on the 16th, we took the person out and completed the first ever unmanned public highway test.

A truck with no driver followed by two SUVs heads onto the highway

The Starsky unmanned truck, and follow vehicles for rescue, head out to complete the worlds first ever unmanned public highway test. STARSKY ROBOTICS

Which is to say, we walked through the Covid-19 ward and didn’t get infected, let alone died. For us to healthily live full-time in the Covid-19 ward there would have been a fair amount more work. Throughout the second half of last year we were in the process of ruggedizing our system to support full-time unmanned operations, we would have needed to drive on the pre-selected routes thousands more times, and probably would have found a whole lot more diagnostics to write.

Just like wide scale re-opening of the economy without mass Covid-19 death, it was possible for our approach to lead to the deployment of unmanned vehicles. And someday, someone will do it.

This post was originally published in Forbes

Stefan Seltz-Axmacher’s General Blog

Thoughts on Robotics, Startups, Leadership, Futurism and more

Tag Safety Engineering

Simpler is Safer: Occam's Safety Razor

Throwing Wrenches At Autonomous Vehicles

Making Teleop Safe

Safety Engineering in The Time of COVID