Template:M intro design System redundancy: Difference between revisions

No edit summary
Line 41: Line 41:


Then, your time horizon for redundancy is not one year, or twenty years, but ''two-hundred and fifty years''. Quarter of a millennium: that is how long it would take to earn back $5 billion in twenty million dollar clips.
Then, your time horizon for redundancy is not one year, or twenty years, but ''two-hundred and fifty years''. Quarter of a millennium: that is how long it would take to earn back $5 billion in twenty million dollar clips.
===On the virtue of slack===
Redundancy is another word for “slack”, in the sense of “looseness in the tether between interconnected parts of a wider whole”.
To optimise normal operation, we hear, we should ''minimise'' slack, thereby generating maximum responsiveness, handling, cornering: what musicians would call  “attack” — tightness gives the greatest torque, the most direct transmission of power to road; the minimum ''latency''.
The tighter we couple inputs to outputs, the faster the response. But the less margin there is for variation.
And, as {{author|Charles Perrow}} notes<ref>In one of the JC’s favourite books, {{br|Normal Accidents: Living with High-Risk Technologies}}.</ref> this in-the-moment flow state, when the machine is humming, is only a stable state in tightly constrained environments. Where every outcome can be predicted, monitored, and sub-optimal ones can be avoided by rote.
But, generally, these are not very interesting environments. They are production lines. Factory shop floors — [[nomological machine|nomological machines]] — where every element of the process is under control. It is where production is ''not'' tightly controlled — intervening agents, third parties, shifting priorities and market conditions — that things get “interesting”.
That very lack of “give” that makes a sports car so responsive on a dry track makes it skid off a wet one. The less slack there is, the less time an operator has to diagnose and fix a problem — or shut the system down — to avoid catastrophic damage.
A system with built-in back-ups and redundancies can go on working while we repair failed components. A certain amount of “stockpiling” in the system allows production to continue should there be any outages or supply chain problems throughout the process.
But even a production line environment is not perfectly stable. It should be in a constant state of improvement whereby engineers monitor and adjust to optimise, to cater for evolving demand, to react to market developments, and capitalise on new technology and knowhow.
This is “meta-production”: a valuable “background processing” function — important and valuable but not day to day “urgent”— for which “redundant” personnel can be occupied, from which they can redeploy immediately should a crisis arise.
This has two benefits: firstly the process of “peacetime” self-analysis should in part be aimed at identifying emerging risks and design flaws in the system, thus heading off incipient crisis; secondly, to do that the personnel need ''expertise'':  an intimate, detailed, holistic understanding of the process and the system. By intimately understanding the system, these second-line workers should therefore be better able to react to a crisis should one arise.
This behaviour rewards long-term “skin in the game”. The best employees here are long-serving, local, full-time, employees full of institutional knowledge and practical hands-on systems knowhow.  Inexperienced outsourced labour, of the sort by whom these traditional experts are being systematically replaced, will be far less use in either role.
To be sure, the importance of employees, and the value they add, is not constant. We all have flat days where we don’t achieve very much. In an operationalised workplace they pick up a penny a day on 99 days out of 100; if they save the firm £ on that 100th day, it is worth paying them 2 pennies a day every day even if, 99 days out of 100, you are making a loss.