Template:M intro design System redundancy: Difference between revisions

no edit summary
Tags: Mobile edit Mobile web edit
No edit summary
Line 1: Line 1:
{{Quote|“I think the people in this country have had enough of experts from organisations with acronyms saying that they know what is best and getting it consistently wrong.”
{{Quote|“I think the people in this country have had enough of experts from organisations with acronyms saying that they know what is best and getting it consistently wrong.”
:—Michael Gove}}
:—Michael Gove}}
 
===On pet management theories===
[[System redundancy|The]] JC likes his pet management theories as you know, readers, and none are dearer to his heart than the idea that the [[High modernism|high-modernist]]s have, for forty years, held western management orthodoxy hostage.  
[[System redundancy|The]] JC likes his pet management theories as you know, readers, and none are dearer to his heart than the idea that the [[High modernism|high-modernist]]s have, for forty years, held western management orthodoxy hostage.  


Line 23: Line 23:


In any case, just in time rationalisers take a cycle and code for that.  What is the process, start to finish, what are the dependencies, what are the plausible unknowns, and how do we optimise for efficiency of movement, components and materials, to manage
In any case, just in time rationalisers take a cycle and code for that.  What is the process, start to finish, what are the dependencies, what are the plausible unknowns, and how do we optimise for efficiency of movement, components and materials, to manage
=== It’s the long run, stupid===
The usual approach for system optimisation is to take a snapshot of the process as it is over its lifecycle, and map that against a hypothetical critical path. Kinks and duplications in the process are usually obvious, and we can iron them out to reconfigure the system to be as efficient and responsive as possible. Mapping best case and worst case scenarios for each phase in that life cycle can give good insights into which parts of the process are in need of re-egineering: it is often [[What will it look like?|not the ones we expect]].
But how long should that life cycle be? We should judge it by the frequency of the worst possible negative event that could happen. Given that we are contemplating the [[infinite]] future, this is hard to say, but it is longer that we think: not just a single manufacturing cycle or reporting period. The efficiency of a process must take in ''all'' parts of the cycle — the whole gamut of the four seasons — not just that nice day in July when all seems fabulous with the world. There will be other days; difficult ones, on which where multiple unrelated components fail at the same moment, or where the market drops, clients blow up, or tastes gradually change. There will be almost imperceptible, secular changes in the market which will demand products be refreshed, replaced, updated, reconfigured; opportunities and challenges will arise which must be met: your window for measuring who and what is ''truly'' redundant in your organisation must be long enough to capture all of those slow-burning, infrequent things.
Take our old, now dearly departed, friends at [[Credit Suisse]]. Like all banks, over the last decade they were heavily focused on the ''cost'' of their prime brokerage operation. Prime brokerage is a simple enough business, but it’s also easy to lose your shirt doing it.
In peace-time, things looked easy for [[Credit Suisse]], so they juniorised their risk teams. This, no doubt, marginally improved their net peacetime return on their relationship with [[Archegos]]. But those wage savings — even if $10m annually, were out of all proportion to the incremental risk that they assumed as a result.
(We are, of course, assuming that better human risk management might have averted that loss. If it would not have, then the firm should not have been in business at all)
The skills and operations you need for these phases are different, more expensive, but likely far more determinative of the success of your organization over the long run.
The [[Simpson’s paradox]] effect: over a short period the efficiency curve may seem to go one way; over a longer period it may run perpendicular.
The perils, therefore, of data: it is necessarily a snapshot, and in our impatient times we imagine time horizons that are far too short. A sensible time horizon should be determined not by reference to  your expected regular income, but to your worst possible day. Take our old friend [[Archegos]]: it hardly matters that you can earn $20m from a client in a year, consistently, every year for twenty years ''if you stand to lose  five billion dollars in the twenty-first''.
Then, your time horizon for redundancy is not one year, or twenty years, but ''two-hundred and fifty years''. Quarter of a millennium: that is how long it would take to earn back $5 billion in twenty million dollar clips.
===On the virtue of slack===
Redundancy is another word for “slack”, in the sense of “looseness in the tether between interconnected parts of a wider whole”.
To optimise normal operation, we hear, we should ''minimise'' slack, thereby generating maximum responsiveness, handling, cornering: what musicians would call  “attack” — tightness gives the greatest torque, the most direct transmission of power to road; the minimum ''latency''.
The tighter we couple inputs to outputs, the faster the response. But the less margin there is for variation.
And, as {{author|Charles Perrow}} notes<ref>In one of the JC’s favourite books, {{br|Normal Accidents: Living with High-Risk Technologies}}.</ref> this in-the-moment flow state, when the machine is humming, is only a stable state in tightly constrained environments. Where every outcome can be predicted, monitored, and sub-optimal ones can be avoided by rote.
But, generally, these are not very interesting environments. They are production lines. Factory shop floors — [[nomological machine|nomological machines]] — where every element of the process is within under control. It is where production is ''not'' tightly controlled — intervening agents, third parties, shifting priorities and market conditions — that things get “interesting”.
That very lack of “give” that makes a sports car so responsive on a dry track makes it skid off a wet one. The less slack there is, the less time an operator has to diagnose and fix a problem — or shut the system down — to avoid catastrophic damage.
A system with built-in back-ups and redundancies can go on working while we repair failed components. A certain amount of “stockpiling” in the system allows production to continue should there be any outages or supply chain problems throughout the process.
But even a production line environment is not perfectly stable. It should be in a constant state of improvement whereby engineers monitor and adjust to optimise, to cater for evolving demand, to react to market developments, and capitalise on new technology and knowhow.
This is “meta-production”: a valuable “background processing” function — important and valuable but not day to day “urgent”— for which “redundant” personnel can be occupied, from which they can redeploy immediately should a crisis arise.
This has two benefits: firstly the process of “peacetime” self-analysis should in part be aimed at identifying emerging risks and design flaws in the system, thus heading off incipient crisis; secondly, to do that the personnel need ''expertise'':  an intimate, detailed, holistic understanding of the process and the system. By intimately understanding the system, these second-line workers should therefore be better able to react to a crisis should one arise.
This behaviour rewards long-term “skin in the game”. The best employees here are long-serving, local, full-time, employees full of institutional knowledge and practical hands-on systems knowhow.  Inexperienced outsourced labour, of the sort by whom these traditional experts are being systematically replaced, will be far less use in either role.
To be sure, the importance of employees, and the value they add, is not constant. We all have flat days where we don’t achieve very much. AIn an operationalised workplace they pick up a penny a day on 99 days out of 100; if they save the firm £ on that 100th day, it is worth paying them 2 pennies a day every day even if, 99 days out of 100, you are making a loss.
===Fragility and tight coupling===
The “leaner” a distributed system is, the more ''fragile'' it will be and the more “single points of failure” it will contain, whose malfunction in the best case, will halt the whole system, and in [[tightly-coupled]] [[complex system]]s may trigger further component failures, chain reactions and unpredictable nonlinear consequences.
A financial market is a [[complex system]] comprised of an indeterminate number of autonomous actors, many of whom (notably corporations) are themselves complex systems, interacting in unpredictable ways.
The robustness of any system depends on the tightness of the coupling between components. How much slack is there? In financial markets, increasingly, none at all.
When the JC started practice a millennium ago, to convey an urgent written communication one gave it to a chap on a bicycle. Well — one gave it to a secretary, who sent it to the mail room, and ''they'' gave it to a chap on a bicycle. [[Facsimile]] was the innovation: while quicker than a bike courier, it was still manual and bounded at either end by analogue processes such that the communication began and ended embedded in a physical [[substrate]] which you couldn’t easily reply to, forward<ref>Not at least without time, manual intervention and loss of fidelity.</ref> let alone cut and paste from.
The analog universe thus imposed its own immutable sobriety upon the conduct of business: there was a genteel maximum at which matters progressed, and that was that. Time is its own natural fire break. You could charge down to the mail room and call back an ill-advised letter in a way you can’t with an intemperate email. Just having the letter typed meant it passed through multiple hands, that effluxion was itself a kind of self-enforcing circumspection and a kind of natural brake upon precipitate behaviour.
In any case ''slow'', loosely-coupled chain reactions have a better chance of being stopped, or contained. The liquidity crunch that ruined Silicon Valley Bank unfolded in minutes, not hours, and did for [[Credit Suisse]], an unrelated bank on a different continent before bank executives could work out what was going on.
So, yes, financial services are tightly coupled. The increasing speed and complexity of the system’s  interconnectedness aggravates crash risk. The more interconnections, the faster information flows around the system, the more quickly it can swamp whatever systems we erect to contain it. Credit Suisse had its own fundamental problems, to be sure, but the speed at which it was brought down by entirely unconnected system events should surely give pause for thought.
Redundancy — slack — in this environment, is a virtue.
====Regulatory ''human'' capital?====
Instinctively, we all know this.
We build certain kinds of redundancy into our systems precisely as a failsafe against catastrophic failure. Financial services regulators require banks to hold [[regulatory capital]] — cash, held against no specific risk, as a bulwark against divers credit and liquidity crises.
[[Tier 1 capital]] is a buffer — slack; a ''[[system redundancy]]'' — designed to protect not just the individual institutions who must hold it ''but the wider system''. As executives at [[Lehman]] and [[Credit Suisse]] would tell us, after the fact, capital takes you so far. (''Before'' the fact they might have grumbled, too, that capital is an expensive dead weight on corporate returns.)
For [[Tier 1 capital|regulatory capital]] is an [[Airbag - steering-wheel continuum|airbag]]: protects you in a prang, but doesn’t help you avoid one in the first place. To be sure, there are accounting techniques that do: [[risk weighting]], [[leverage ratio]]s, [[regulatory margin]] — when they work, they are better than airbags, but they suffer from being determinate responses to unpredictable problems.
There have already been three [[Basel Accords]]; they are working on a fourth, because first three haven’t had the desired effect. We still have periodic market meltdowns, not because the Basel rules aren’t detailed enough, but because, fundamentally, fixed rules cannot manage indeterminate risk situations. We have seen over and over well-meant rules behave counterintuitively at times of stress.<ref>Quoth the [[Basel Committee on Banking Supervision|Basel Committee]], explaining its most recent rules: <br>“''An underlying cause of the global financial crisis was the build-up of excessive on- and off-balance sheet leverage in the banking system. In many cases, banks built up excessive leverage while apparently maintaining strong risk-based capital ratios. At the height of the crisis, financial markets forced the banking sector to reduce its leverage in a manner that amplified downward pressures on asset prices. This deleveraging process exacerbated the feedback loop between losses, falling bank capital and shrinking credit availability.''”</ref>
We should not be surprised: accounting rules aren't sentient. They cannot read the market, understand a given institution’s cultural dynamics, let alone its particular risk profile in times of unforeseeable stress. 
But different kinds of buffer might be more effective at avoiding the kinds of pickle that leveraged financial institutions can get themselves into. Buffers of resource, material and significantly expert people: overabundances of skill, experience and expertise that ''can'' diagnose, react to, prevent and manage liquidity crises.
Why not, as well as regulatory ''share'' capital, encourage our institutions to hold excess [[human capital|''human'' capital]]? Or at least be less cavalier about systematically ''removing'' it, in the name of short-term cost savings.
Just as we one must hold share capital in fair weather as well as foul, we should not expect to run expertise in fair weather on a shoestring. You can’t buy in institutional knowledge in a time of crisis. You can’t buy institutional knowledge ''at all''. Even un-contextualised expertise, at a time of panic, will command outrageous premiums.
===''Jidoka''===
But what, a finance director might ask, would these expensive experts do if they are technically “redundant”?
Unlike [[tier 1 capital]], ''human'' capital need not just sit there costing money. These are people you can use as systems design and process experts, to analyse systems, root out anachronisms, build parallel state-of-the-art it systems from which legacy infrastructure can be migrated. This is jidoka — automation with a human touch. This is creative, rewarding, builds
We run the gamut from superfragility, where component failure triggers system ''meltdown'' — these are {{author|Charles Perrow}}’s“[[system accident]]s”; a continuum between normal fragility, where component failure causes system disruption and normal robustness where there is enough redundancy in the system that it can withstand outages and component failures, bit components will continue to fail in predictable ways, and then antifragility, where the redundancy itself is able to respond to component failures and secular challenges, and resigns the system in light of experience to ''reduce'' the risk of known failures.
The difference between robustness and antifragility here is the quality of the redundant components. If your redundancy strategy is to have lots of excess stock, lots of spare components and an inexhaustible supply of itinerant, enthusiastic but inexpert school-leavers from Bucharest ,then your machine will be robust and functional will be able to keep operating as long as macro conditions persist, but it will not learn it will not develop, and it will not adapt to changing circumstances.
An antifragile system requires both kinds of redundancy: plant and stock, to keep the machine going, but tools and knowhow, to tweak the machine. Experience, expertise and insight. The same things — though they are expensive — that can head off catastrophic events can apprehend and capitalise upon outsized business opportunities. ChatGPT will not help with that.
===Redundancy as a key to successful change management===
{{Quote|
Gravity always wins.
: Radiohead, ''Fake Plastic Trees'' (1992)}}
Damon Centola’s research about concentration and bunching of constituents to ensure change is permanent.
[[Complex system]]s seek out their own equilibria. Over time, the autonomous components in the system — people, mostly — settle into habits, ways of working, creating their own networks, dependencies and generally acquiring their own meta theories of what they are there to do and how best to do it (some do this more consciously than others, but all, at some level do it.) These priorities will be personal to the agent, and they may partly coincide with the organisation’s, they won’t entirely — it is no part of a corporation’s plan, above all else, to make sure I stay here, and thrive, and get well paid, while minimising personal risk and responsibility, and these,we submit, motivate most corporate employees more deeply than ensuring immaculate shareholder return. But we digress.
The systems and subsystems evolve ways of working that create their own efficiencies — efficiencies that yield to those personal motivations, remember, not corporate ones. They wear in grooves, smooth down sharp edges and naturally, through the adaptive process of time, seek out local maxima. We should not be surprised that systems which have found an equilibrium are hard to shift from it. Call that equilibrium an “operating paradigm”.
In a fight between logic and gravity, gravity always wins.
It stands to reason that a single “change agent” who arrives from outside and says, “hey, fellas, wouldn’t it be great if we fixed this?” won’t get far with the veteran crew who run the process now. The thing about an operating paradigm is that it is operating. On its own terms, it works. It ''isn’t in crisis''. Now in {{author|Thomas Kuhn}}’s conception of them,<ref>{{br|The Structure of Scientific Revolutions}}. Wonderful book.</ref> paradigms generally only break down if they don't work. As far as it's constituents are concerned, it is working ''fine''. They may regard it as a thing of beauty, a many-splendoured contraption that