Scale: Difference between revisions

From The Jolly Contrarian
Jump to navigation Jump to search
No edit summary
No edit summary
 
Line 20: Line 20:


====LLMs and scale====
====LLMs and scale====
The main driver of the information revolution to date has been ''scale'': the more connections, the bigger the network, the better it performs and the less stress there is on any single component.
The main driver of the information revolution to date has been ''scale'': the more connections, the bigger the network, the better it performs and the less stress and dependency there is on any single component. Networks in this way are antifragile and they scale in a superlinear way.  


Cognitive load per node goes ''down'' — we each get more efficient and can allocate resources more effectively — and overall benefit for the whole system goes up. The cognitive load reduction is probably ''linear''; and the benefit per person is probably linear, too — but because the number of interconnections increases exponentially, The overall benefit is ''non''-linear. It is a great trade.
Cognitive load per node goes ''down'' — we each get more efficient and can allocate resources more effectively — and overall benefit for the whole system goes up. The cognitive load reduction is probably ''linear''; and the benefit per person is probably linear, too — but because the number of interconnections increases exponentially, The overall benefit is ''non''-linear. It is a great trade.

Latest revision as of 13:24, 20 August 2024

Risk Anatomy™


Tell me more
Sign up for our newsletter — or just get in touch: for ½ a weekly 🍺 you get to consult JC. Ask about it here.

The point where the scale opportunities are large enough to require active management.

“Passive” economies of scale flow from the simple fact of size (e.g., adding another user to an existing software licence automatically reduces the per-user cost of the licence, without anyone having to do anything). But these passive economies run off at the point where one needs to divert the firm’s resources and personnel towards managing these efficiencies. One must spend to save, manufacturing scale efficiencies that won’t arise by themselves. For example, negotiating law firm panel arrangements, outsourcing and offshoring).

A firm may engage management consultants, middle managers and eventually a chief operating officer whose only job is to extract efficiencies. As long as the efficiencies wrought are greater than the marginal cost of that person, team, or fiefdom, then the fiefdom can be justified on hard economic data.

But O, Paradox: the COO unit itself can become so complex that it presents its own scale opportunities. Beyond a point, it becomes so complex, so inefficient, that one should appoint a chief operating officer for the chief operating officer’s office, tasked with consolidating all the diaspora of COO functions groups, initiatives and change managers into a single function.

As you know, the JC is principally concerned with the management of in-house legal. Once upon a time, the legal department was itself a kind of operating office, there not to dispense its own legal advice so much as manage the outsourcing of legal advice from law firms... and, of course, check the firm’s name was spelt and punctuated correctly on the football team. That’s “Wickliffe Hampton S.A., acting through its London Branch”, everybody!

So it is some irony that its scale has become such — bigger operations may have the thick end of a thousand lawyers in-house — that firms are forming operations teams to manage the legal teams.

Scale and rent

It could be argued, and the JC does argue, that the yen for scale in modern commerce is driven not by an aspiration for economy, much less to save a customer money, but for the opportunities it affords to extract rent. However many thousands of organisations make you the financial services sector; however many hundreds of thousands are employed, directly or indirectly servicing those organisations, there are orders of magnitude more putting their hard-earned dollars into that system in the hope of some kind of return.

The point where scale becomes really exciting is where the cost of rent extraction, per dollar, is so minimal that the hosts — beg your pardon, customers — don't even notice it. Scale then becomes free money; you don't need to reduce your rate because the customers will pay it anyway.

This is why hedge fund managers with 5 billion AUM are happier than those with 500m.

LLMs and scale

The main driver of the information revolution to date has been scale: the more connections, the bigger the network, the better it performs and the less stress and dependency there is on any single component. Networks in this way are antifragile and they scale in a superlinear way.

Cognitive load per node goes down — we each get more efficient and can allocate resources more effectively — and overall benefit for the whole system goes up. The cognitive load reduction is probably linear; and the benefit per person is probably linear, too — but because the number of interconnections increases exponentially, The overall benefit is non-linear. It is a great trade.

This is not how LLMs work — nor, I think, Crypto. They are progressively resource-hungry. The more you ask them to remember, the more processing load they require because they have to reprocesses the whole thread. This is why they are configured to be like goldfish: outside a conversation session, they will not “remember” anything you told them.

This is why LLMs limit the number of exchanges allowed in a given thread. If you start again, the processing load gets much lighter. But the LLM forgets everything. It is like Guy Pearce in Memento

It is worth viewing that against the metaphor that an LLM is like a really keen, fast, but inexperienced graduate, straight out of school. It will go a zillion miles an hour, but *check it*, you know?

Now the thing about graduates is that they get better. They learn; they remember. They gain institutional knowledge and battle-scars. They pick up tricks, learn heuristics, and form good (and bad!) habits.

They acquire “metis”, that is to say. Wisdom. That is the investment we make: we are happy to trade “quality product” for “enthusiasm” up front because that time we spend getting grads up the curve will pay off. They will get better.

Would we be as patient if, every morning, like goldfish our grads reset to day 1? Or would the novelty wear off?

LLMs can be configured to learn, but — the cost is incremental performance loss versus ever greater use of processing cycles. So no-one wants to offer that.

And it doesn’t scale:

LLMs don’t scale like that. They anti-scale. The more people use them, and the more memory people demand they have, the processing cost gets exponentially higher, while the incremental benefit will tend to zero.

This — well, it’s not good.

See also