Verifiably Correct Asynchronous Control Logic

Control logic in a multi-threaded or multi-process system – i.e. in an asynchronously communicating system – is extraordinarily difficult to get completely right. The more modules and threads and processes it has, the more states there are. State/trace space size grows exponentially, explosively, making it impossible to test all traces/paths for all potential problems: unhandled events; nondeterministic execution; race conditions; deadlocks; live-locks from infinite loops or unbounded recursion; and so forth.

A subtle, buried problem in asynchronous control logic can be disastrous, catastrophic. It can leap out even after years of documented problem-free operation. Once seen, it often cannot be reproduced, because the exact combination of states and messaging events that led to it is unknown and highly improbable – it’s a “Heisenbug.” A non-reproducible problem cannot be reliably fixed. Testing/verification can uncover all such problems only through 100% trace/path coverage. But 100% trace coverage in even a medium-sized code base is typically impossible due to state/trace space size.

The only practical and proven solution is to model control logic in relatively small hierarchical chunks; carry out model-checking via a logic engine to exhaustively examine all possible traces/paths within each hierarchical chunk, and thereby find and remove all problems in that chunk; and automatically generate control-logic source code, which comprises the “skeleton” of the asynchronously communicating system.

A few products and many free tools provide such solutions. Amazon Web Services uses TLA+ from Microsoft Research. NASA uses SPIN for Mars rovers and other critical systems; see also here. Several major industrial companies in Western Europe have used Verum’s Dezyne product (Netherlands) to design, build and update critical industrial, medical and automotive control systems for the past ~15 years. These companies say two key things:

— The resulting software control systems just work, and just keep on working through updates and revisions, with zero Heisenbugs.

— Using model-checking reduces development costs and schedules by 20-50%.

Disclosure: I have no financial interest or other vested interest in Verum or any of its products. I do think that Dezyne presents the right blend of rigor, scalability and usability for critical control-system software engineering.