Normal Accidents: Living with High-Risk Technologies: Difference between revisions

no edit summary
No edit summary
No edit summary
Line 6: Line 6:


If it is right, it has profound consequences for how we in complex, tightly coupled systems, should think about risk. It seems inarguably right.
If it is right, it has profound consequences for how we in complex, tightly coupled systems, should think about risk. It seems inarguably right.
===[[Complex]] interactions and [[tight coupling]]===
===[[Complex interaction]]s and [[tight coupling]]===
First, some definitions.  
First, some definitions.  
*'''Complexity''': Perrow anticipates the later use of the concept of “[[complexity]]” — a topic which is beginning to infuse the advocacy part of this site — without the benefit of [[systems analysis]], since it hadn’t really been invented when he was writing, but to describe interactions between non-adjacent subcomponents of a system that were neither intended nor anticipated by the designers of the system. Complex interactions are not only unexpected, but for a period of time (which may be critical, if the interacting components are [[tightly coupled]]) will be ''incomprehensible''. This may be because the interactions cannot be seen, buried under second-order control and safety systems, or even because they are not ''believed''.  If your  — ''wrong'' — theory of the game is that the risk in question is a [[ten sigma event]],  expected only once in one hundred million years,  you may have a hard time believing it could be happening in your fourth year of operation, as the partners of [[Long Term Capital Management]] may tell you. Here even [[epistemology]] is in play. Interactions that were not in our basic conceptualisation the world, are not ones we can reasonably anticipate. These interactions were, QED, not ''designed'' into the system; no one ''intended'' them. “They baffle us because we acted in terms of our own designs of a world that we expected to exist—but the world was different.”<ref>{{br|Normal Accidents}}, p. 75. Princeton University Press. Kindle Edition. </ref>
*'''Complexity''': Perrow anticipates the later use of the concept of “[[complexity]]” — a topic which is beginning to infuse the advocacy part of this site — without the benefit of [[systems analysis]], since it hadn’t really been invented when he was writing, but to describe interactions between non-adjacent subcomponents of a system that were neither intended nor anticipated by the designers of the system. Complex interactions are not only unexpected, but for a period of time (which may be critical, if the interacting components are [[tightly coupled]]) will be ''incomprehensible''. This may be because the interactions cannot be seen, buried under second-order control and safety systems, or even because they are not ''believed''.  If your  — ''wrong'' — theory of the game is that the risk in question is a [[ten sigma event]],  expected only once in one hundred million years,  you may have a hard time believing it could be happening in your fourth year of operation, as the partners of [[Long Term Capital Management]] may tell you. Here even [[epistemology]] is in play. Interactions that were not in our basic conceptualisation the world, are not ones we can reasonably anticipate. These interactions were, QED, not ''designed'' into the system; no one ''intended'' them. “They baffle us because we acted in terms of our own designs of a world that we expected to exist—but the world was different.”<ref>{{br|Normal Accidents}}, p. 75. Princeton University Press. Kindle Edition. </ref>
*'''[[Linear interactions]]''': Contrast [[complex interactions]] with much more common “[[linear interactions]], where parts of the system interact with other components that precede or follow them in the system in ways that are expected and planned: “if ''this'', then ''that''”. In a well-designed system, these will (of course) predominate: any decent system should mainly do what it is designed to do and not act erratically in normal operation. Some systems are more complex than others, but even in the most linear systems are susceptible to some complexity — where they interact with the environment.<ref>Perrow characterises a “complex system” as one where ten percent of interactions are complex; and a “linear system” where less than one percent or interactions are complex. The greater the percentage of complex interactions in a system, the greater the potential for system accidents.</ref> Cutting back into the language of [[systems analysis]] for a moment, consider that [[linear interaction]]s are a ''feature'' of [[simple]] and [[complicated system]]s, and can be “pre-solved” and brute-force computed; at least in theory. They can be managed by [[algorithm]], or [[playbook]]. But [[complex interactions]], by definition, ''cannot'' — they are the interactions the [[algorithm]] ''didn’t expect''.
*'''[[Linear interaction]]s''': Contrast [[complex interaction]]s with much more common “[[linear interaction]]s”, where parts of the system interact with other components that precede or follow them in the system in ways that are expected and planned: “if ''this'', then ''that''”. In a well-designed system, these will (of course) predominate: any decent system should mainly do what it is designed to do and not act erratically in normal operation. Some systems are more complex than others, but even in the most linear systems are susceptible to some complexity — where they interact with the environment.<ref>Perrow characterises a “complex system” as one where ten percent of interactions are complex; and a “linear system” where less than one percent or interactions are complex. The greater the percentage of complex interactions in a system, the greater the potential for system accidents.</ref> Cutting back into the language of [[systems analysis]] for a moment, consider that [[linear interaction]]s are a ''feature'' of [[simple]] and [[complicated system]]s, and can be “pre-solved” and brute-force computed; at least in theory. They can be managed by [[algorithm]], or [[playbook]]. But [[complex interactions]], by definition, ''cannot'' — they are the interactions the [[algorithm]] ''didn’t expect''.
*'''[[Tight coupling]]''': However complex interactions are only a source of catastrophe if another condition is satisfied: that they are “tightly coupled” — processes happen fast, can’t be turned off, failing components can’t be isolated. Perrow’s observation is that systems tend to be more tightly coupled than we realise.
*'''[[Tight coupling]]''': However complex interactions are only a source of catastrophe if another condition is satisfied: that they are “tightly coupled” — processes happen fast, can’t be turned off, failing components can’t be isolated. Perrow’s observation is that systems tend to be more tightly coupled than we realise.


Line 28: Line 28:
Here is where the folly of [[complicated]] safety mechanisms comes in: adding linear safety systems to a system ''increases'' its complexity, and makes dealing with systems failures, when they occur, even harder. Not only do linear safety mechanisms exacerbate or even create their own accidents, but they also afford a degree of false comfort that encourages managers, who typically have financial targets to meet, not safety ones — to run the system harder, thus increasing the tightness of the coupling between unrelated components. That same Triple A rating that lets your risk officer catch some zeds at the switch encourages your trader to double down. ''I’m covered. What could go wrong?''  
Here is where the folly of [[complicated]] safety mechanisms comes in: adding linear safety systems to a system ''increases'' its complexity, and makes dealing with systems failures, when they occur, even harder. Not only do linear safety mechanisms exacerbate or even create their own accidents, but they also afford a degree of false comfort that encourages managers, who typically have financial targets to meet, not safety ones — to run the system harder, thus increasing the tightness of the coupling between unrelated components. That same Triple A rating that lets your risk officer catch some zeds at the switch encourages your trader to double down. ''I’m covered. What could go wrong?''  


Part of the voyeuristic pleasure of Perrow’s book is the salacious detail with which he documents the sequential failures at Three Mile Island, the Space Shuttle ''Challenger'', Air New Zealand’s Erebus flight, among many other disasters and near-misses. The chapter on maritime collisions would be positively hilarious were it not so distressing.
Part of the voyeuristic pleasure of Perrow’s book is the salacious detail with which he documents the sequential failures at Three Mile Island, the Space Shuttle ''Challenger'', Air New Zealand’s Erebus crash, among many other disasters and near-misses. The chapter on maritime collisions would be positively hilarious were it not so distressing.


===“Operator error” is almost always the wrong answer===
===“Operator error” is almost always the wrong answer===
Line 37: Line 37:
:''Besides, about this time—just four or five minutes into the accident—another more pressing problem arose. The reactor coolant pumps that had turned on started thumping and shaking. They could be heard and felt from far away in the control room. Would they withstand the violence they were exposed to? Or should they be shut off? A hasty conference was called, and they were shut off. (It could have been, perhaps should have been, a sign that there were further dangers ahead, since they were “cavitating”—not getting enough emergency coolant going through them to function properly.) In the control room there were three audible alarms sounding, and many of the 1,600 lights (on-off lights and rectangular displays with some code numbers and letters on them) were on or blinking. The operators did not turn off the main audible alarm because it would cancel some of the annunciator lights. The computer was beginning to run far behind schedule; in fact it took some hours before its message that something might be wrong with the PORV finally got its chance to be printed. Radiation alarms were coming on. The control room was filling with experts; later in the day there were about forty people there. The phones were ringing constantly, demanding information the operators did not have. Two hours and twenty minutes after the start of the accident, a new shift came on. <ref>{{br|Normal Accidents}} p. 28.</ref>  
:''Besides, about this time—just four or five minutes into the accident—another more pressing problem arose. The reactor coolant pumps that had turned on started thumping and shaking. They could be heard and felt from far away in the control room. Would they withstand the violence they were exposed to? Or should they be shut off? A hasty conference was called, and they were shut off. (It could have been, perhaps should have been, a sign that there were further dangers ahead, since they were “cavitating”—not getting enough emergency coolant going through them to function properly.) In the control room there were three audible alarms sounding, and many of the 1,600 lights (on-off lights and rectangular displays with some code numbers and letters on them) were on or blinking. The operators did not turn off the main audible alarm because it would cancel some of the annunciator lights. The computer was beginning to run far behind schedule; in fact it took some hours before its message that something might be wrong with the PORV finally got its chance to be printed. Radiation alarms were coming on. The control room was filling with experts; later in the day there were about forty people there. The phones were ringing constantly, demanding information the operators did not have. Two hours and twenty minutes after the start of the accident, a new shift came on. <ref>{{br|Normal Accidents}} p. 28.</ref>  


This is, as Perrow sees it, the central dilemma of the complex system. The nature of normal accidents is such that they need experienced, wise operators on the ground ready to think quickly and laterally to solve unfolding problems, but the enormity of the risks  
This is, as Perrow sees it, the central dilemma of the complex system. The nature of normal accidents is such that they need experienced, wise operators on the ground ready to think quickly and laterally to solve unfolding problems, but the enormity of the risks involved mean that