Normal Accidents: Living with High-Risk Technologies: Difference between revisions
Amwelladmin (talk | contribs) No edit summary |
Amwelladmin (talk | contribs) No edit summary |
||
Line 17: | Line 17: | ||
So, financial services [[risk controller]]s take note: if your system is a complex, tightly-coupled system — and it is — ''you cannot solve for systemic failures. You can’t prevent them. You have to have arrangements in place to ''deal'' with them. These arrangements need to be able to deal with the unexpected outputs of a ''[[complex]]'' system, not the predictable effects of a merely ''[[complicated]]'' one. | So, financial services [[risk controller]]s take note: if your system is a complex, tightly-coupled system — and it is — ''you cannot solve for systemic failures. You can’t prevent them. You have to have arrangements in place to ''deal'' with them. These arrangements need to be able to deal with the unexpected outputs of a ''[[complex]]'' system, not the predictable effects of a merely ''[[complicated]]'' one. | ||
Why make the distinction between complex and complicated like this? because pre-configured devices — [[risk taxonomy|risk taxonomies]], [[playbook]]s, [[checklist]]s, [[neural networks]] may help resolve isolated failures in ''complicated'' components, but they have ''no'' chance of | Why make the distinction between complex and complicated like this? because pre-configured devices — [[risk taxonomy|risk taxonomies]], [[playbook]]s, [[checklist]]s, [[neural networks]], even ~ ''cough'' ~ [[contract|contractual rights]]s may help resolve isolated failures in ''complicated'' components, but they have ''no'' chance of resolving systems failures. They are ''of'' the system. They are ''part'' of what has failed. Not only that: these safety mechanisms, by their existence, contribute to complexity in the system, and when a system failure happens they can make it ''harder'' to detect what has gone wrong. | ||
===Inadvertent complexity=== | ===Inadvertent complexity=== |
Revision as of 16:52, 31 August 2020
|
This is one of those “books that will change your life”. Well — that should change lives — that it was written in 1984 — Charles Perrow passed away in 2019 — suggests that, maybe it hasn’t: that the irrationalities that motivate so much of what we do are more pervasive than plainly written common sense.
Charles Perrow was a sociologist who fell into the discipline of systems analysis: analysing how social structures like businesses, governments and public utilities, being loose networks of autonomous individuals, work. Perrow’s focus fell upon organisations that present specific risks to operators, passengers, innocent bystanders — nuclear and other power stations, airways, shipping lines, but the read-across to the financial systems is obvious — where a combination of complexity and tight coupling mean that periodic catastrophic accidents are not just likely, but inevitable. It is the intrinsic property of a complex, tightly coupled system — not merely a function of operator error that can be blamed on a negligent employee — that it will fail catastrophically.
If it is right, it has profound consequences for how we in complex, tightly coupled systems, should think about risk. It seems inarguably right.
Complex interactions and tight coupling
First, some definitions.
- Complexity: Perrow anticipates the later use of the concept of “complexity” — a topic which is beginning to infuse the advocacy part of this site — without the benefit of systems analysis, since it hadn’t really been invented when he was writing, but to describe interactions between non-adjacent subcomponents of a system that were neither intended nor anticipated by the designers of the system. Complex interactions are not only unexpected, but for a period of time (which may be critical, if the interacting components are tightly coupled) will be incomprehensible. This may be because the interactions cannot be seen, buried under second-order control and safety systems, or even because they are not believed. If your — wrong — theory of the game is that the risk in question is a ten sigma event, expected only once in one hundred million years, you may have a hard time believing it could be happening in your fourth year of operation, as the partners of Long Term Capital Management may tell you. Here even epistemology is in play. Interactions that were not in our basic conceptualisation the world, are not ones we can reasonably anticipate. These interactions were, QED, not designed into the system; no one intended them. “They baffle us because we acted in terms of our own designs of a world that we expected to exist—but the world was different.”[1]
- Linear interactions: Contrast complex interactions with much more common “linear interactions”, where parts of the system interact with other components that precede or follow them in the system in ways that are expected and planned: “if this, then that”. In a well-designed system, these will (of course) predominate: any decent system should mainly do what it is designed to do and not act erratically in normal operation. Some systems are more complex than others, but even in the most linear systems are susceptible to some complexity — where they interact with the environment.[2] Cutting back into the language of systems analysis for a moment, consider that linear interactions are a feature of simple and complicated systems, and can be “pre-solved” and brute-force computed; at least in theory. They can be managed by algorithm, or playbook. But complex interactions, by definition, cannot — they are the interactions the algorithm didn’t expect.
- Tight coupling: However complex interactions are only a source of catastrophe if another condition is satisfied: that they are “tightly coupled” — processes happen fast, can’t be turned off, failing components can’t be isolated. Perrow’s observation is that systems tend to be more tightly coupled than we realise.
Normal accidents
Where you have a complex system, we should expect accidents — and opportunities, quirks and serendipities, but here we are talking about risk — to arise from unexpected, non-linear interactions. Such accidents, says Perrow, arer“normal”, not in the sense of being regular or expected,[3] but in the sense that it is an inherent property of the system to have this kind of accident.
Are financial systems complex? About as complex as any distributed system known to humankind. Are they tightly coupled? Well, you could ask the principals of LTCM, Enron, Bear Stearns, Amaranth Advisors, Lehman brothers or Northern Rock, if any of those venerable institutions were still around to tell yiou about it.
So, financial services risk controllers take note: if your system is a complex, tightly-coupled system — and it is — you cannot solve for systemic failures. You can’t prevent them. You have to have arrangements in place to deal with them. These arrangements need to be able to deal with the unexpected outputs of a complex system, not the predictable effects of a merely complicated one.
Why make the distinction between complex and complicated like this? because pre-configured devices — risk taxonomies, playbooks, checklists, neural networks, even ~ cough ~ contractual rightss may help resolve isolated failures in complicated components, but they have no chance of resolving systems failures. They are of the system. They are part of what has failed. Not only that: these safety mechanisms, by their existence, contribute to complexity in the system, and when a system failure happens they can make it harder to detect what has gone wrong.
Inadvertent complexity
So far, so hoopy; but here’s the rub: we can make systems and processes more or less complex and, to an extent, reduce tight coupling by careful system design and iterative improvement: air transport has become progressively less complex as it has developed. It has learned from each accident. But it is axiomatic that we can’t eliminate complexity.
Here is where the folly of complicated safety mechanisms comes in: adding linear safety systems to a system increases its complexity, and makes dealing with complex interactions even harder. Not only do they create potential accidents of their own, but they also afford a degree of false comfort that encourages managers, who typically have financial targets to meet, not safety ones — to run the system harder, thus increasing the coupling of unrelated components. Perrow catalogues the chain of events leading up to the meltdown at Three Mile Island.
“Operator error” is almost always the wrong answer
Human beings being system components, it is rash to blame for failure a component constitutionally disposed to fail, even when not put in a position, through system design or economic incentive — a ship’s captain being expected to work a 48-hour watch — where failure is more or less inevitable (Perrow calls these “forced operator errors”).
- But again, “operator error” is an easy classification to make. What really is at stake is an inherently dangerous working situation where production must keep moving and risk-taking is the price of continued employment.[4]
If an operator's role is simply to carry out a tricky but routine part of the system then the inevitable march of technology makes this ever more fault of design and not personnel: humans, we know, are not good computers. They are good at figuring out what to do when something unexpected happens; making decisions; exercising judgment. But they — we — are lousy at doing repetitive tasks and following instructions. As The Six Million Dollar Man had it, we have the technology. We should damn well use it. If, on the other hand, the operator’s role is to manage complexity —
Yet if you are facing
See also
References
- ↑ Normal Accidents, p. 75. Princeton University Press. Kindle Edition.
- ↑ Perrow characterises a “complex system” as one where ten percent of interactions are complex; and a “linear system” where less than one percent or interactions are complex. The greater the percentage of complex interactions in a system, the greater the potential for system accidents.
- ↑ In the forty-year operating history of nuclear power stations, there had (at the time of writing!) been no catastrophic meltdowns, “... but this constitutes only an “industrial infancy” for complicated, poorly understood transformation systems.” Perrow had a chilling prediction: “... the ingredients for such accidents are there, and unless we are very lucky, one or more will appear in the next decade and breach containment.” Ouch.
- ↑ Normal Accidents p. 249.