This is the keynote address I gave at the INCOSE International Symposium on Systems Engineering, in July 2005.
Software has an important role to play in making modern systems more flexible, adaptable and autonomous. But we don't yet have a mature engineering discipline for software development. For the systems engineer, important questions are still unanswered: how risky is software in comparison with other parts of a system? Can software be treated as 'just another component'? Or does software demand special attention in systems engineering?
The emerging field of software forensics can shed some light on these questions. By investigating the circumstances surrounding software failures, we get a sense of the risks involved. In this talk, I will use a series of case studies from the space program to draw out some crucial lessons. The examples include the European Space Agency's original Ariane-5 launch vehicle, and several of NASA's Mars probes. Each of these case studies makes a fascinating story in its own right. In each case, the failure appears to be a normal accident: a relatively simple technical problem led to a systems failure because a whole series of systems engineering mistakes allowed it to. However, the failure profiles in these cases reveal some of the key distinguishing characteristics of software. These characteristics have important implications for systems engineering.
A Handbook of Software and Systems Engineering: Empirical Observations, Laws,
and Theories. by Endres and Rombach.
Mechanizing Proof: Computing, Risk, and Trust , by Donald MacKenzie.
Visual Explanations,
Images and Quantities, Evidence and Narrative. by Edward Tufte.
Normal Accidents
by Charles Perrow
To Engineer is Human: The Role of Failure in Successful Design, by Henry
Petroski.
Safeware:
System Safety and Computers
What Do You
Care What Other People think? by Richard Feynman