Template:M intro technology Better call ChatGPT: Difference between revisions
Amwelladmin (talk | contribs) |
Amwelladmin (talk | contribs) Tags: Mobile edit Mobile web edit |
||
(16 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
{{quote|“The era of LLM dominance in legal contract review is upon us”.}} | {{quote|“The era of LLM dominance in legal contract review is upon us”.}} | ||
{{drop|O|h, just listen}} to the tiny, AI violins. Some legal technologists<ref>At the Onit Inc. “AI Center of Excellence” in Auckland, New Zealand.</ref> have presented a “{{plainlink|https://arxiv.org/html/2401.16212v1|groundbreaking comparison between large language models and “traditional legal contract reviewers}}” — being junior lawyers and legal process outsourcers — benchmarking then against a “ground truth” set by senior lawyers. | {{drop|[[Better call ChatGPT|O]]|h, just listen}} to the tiny, AI violins. Some legal technologists<ref>At the Onit Inc. “AI Center of Excellence” in Auckland, New Zealand.</ref> have presented a “{{plainlink|https://arxiv.org/html/2401.16212v1|groundbreaking comparison between large language models and “traditional legal contract reviewers}}” — being junior lawyers and legal process outsourcers — benchmarking then against a “ground truth” set by senior lawyers. | ||
It did not go well for the meatware. | It did not go well for the meatware. | ||
Line 7: | Line 7: | ||
===== The buried lead: variance ''increases'' with experience ===== | ===== The buried lead: variance ''increases'' with experience ===== | ||
{{drop|A|n interesting finding,}} noted but not explored by the paper, was a variance measurement | {{drop|A|n interesting finding,}} noted but not explored by the paper, was a variance measurement across the categories of human reviewers: the ''least'' qualified, the [[LPO]]s had an “alpha” variance of 1.0, implying complete agreement among their operatives about the issues (a function, we suppose, of the mechanical obedience that LPO businesses drum into their paralegals). This ''dropped'' to 0.77 for junior lawyers and dropped ''further'', to 0.71, for senior lawyers.<ref>“Cronbach’s alpha” is a statistic that measures internal consistency and reliability, of a different items such as, in this case, the legal agreement reviews. A high “alpha” indicates consistency and general agreement between individual reviewers; a low alpha indicates variance or disagreement.</ref> | ||
You read that right: experienced lawyers were ''least'' likely to agree what was important in a basic contract. | You read that right: experienced lawyers were ''least'' likely to agree on what was important in a basic contract. | ||
This says one of two things: either lawyers get worse at | This says one of two things: either lawyers get ''worse'' at analysing contracts across their careers— by no means out of the question, but seeming at the very least in need of explanation — or there is something not measured in these [[key performance indicator]]s that sets the veterans apart. That, maybe, linear contract analytics is the proverbial [[machine for judging poetry]], and isn’t all there is to it. | ||
Hold that thought. | Hold that thought. | ||
Line 18: | Line 18: | ||
{{drop|I|n any case}}, for accuracy the LPO [[paralegal]]s did best, both in spotting issues and in locating them in the contract. (How you can ''spot'' an issue but not know where it is we are not told). Junior lawyers ranked about the ''same'' as the chatbots. Perhaps to spare their blushes the report does not say how the vets got on. | {{drop|I|n any case}}, for accuracy the LPO [[paralegal]]s did best, both in spotting issues and in locating them in the contract. (How you can ''spot'' an issue but not know where it is we are not told). Junior lawyers ranked about the ''same'' as the chatbots. Perhaps to spare their blushes the report does not say how the vets got on. | ||
But it shouldn’t surprise anyone that all the machines were quicker | But it shouldn’t surprise anyone that all the machines were quicker than the humans, of whom LPOs were by far the slowest. There is a cost to obliging humans to behave like robots. | ||
The clear implication: as we can expect [[LLM]]s to get better over time,<ref>Maybe {{Plainlink|https://www.theregister.com/2023/07/20/gpt4_chatgpt_performance/|not, actually}}, but okay.</ref> the [[meatware]]’s days are numbered. | |||
Now, if you ask an experienced lawyer to craft a set of abstract guidelines that she must hand off to low-cost, rule-following units, but for whose operation she remains accountable, expect her to draw her boundaries ''conservatively''. | Now, if you ask an experienced lawyer to craft a set of abstract guidelines that she must hand off to low-cost, rule-following units, but for whose operation she remains accountable, expect her to draw her boundaries ''conservatively''. | ||
There being no “[[bright line test|bright lines]]” wherever there is scope for nuance or call for subtlety, she will stay well | There being no “[[bright line test|bright lines]]” wherever there is scope for nuance or call for subtlety, she will stay well within the smudgy thresholds, not trusting a dolt — whether naturally or generatively intelligent — to get it right. | ||
This is common sense and little more than prudent [[triage]]: well before any practical danger, her amanuenses must report to | This is common sense and little more than prudent [[triage]]: well before any practical danger, her amanuenses must report back to Matron for further instruction. She can then send them back into the fray with contextualised orders, or just handle the tricky stuff herself. This is all good [[Best practice|best-practice]] outsourcing, straight from the McKinsey [[playbook]]. | ||
Now, a standard-form contract without at least one howling error is unknown to legal science, so she should expect an assiduous machine reader, so instructed, to be tireless, and quickly | Now, a standard-form contract without at least one howling error is unknown to legal science, so she should expect an assiduous machine reader, so instructed, to be tireless, and quickly [[Tedium|tire''some'']], in rooting out formal discrepancies and bringing them to her attention. | ||
====Variance, redux: when “solution” ''is'' the problem==== | |||
{{Quote|Q: What’s the difference between an [[LLM]] and a [[trainee]]? <br> | |||
A: You only have to punch information into an [[LLM]] once.<ref>This is a nerd’s version of the drummer joke: ''What’s the difference between a drummer and a drum machine? You only have to punch information into a drum machine once.''</ref> }} | |||
{{drop|C|ontrary to [[modernist]]}} wisdom — viz., ''thou shalt not rest until all problems are solved'' — descending the fractal tunnel of error is, sometimes, a bad idea. ''Usually'', in fact. Down it are [[snafu]]<nowiki/>s and boo-boos that an experienced lawyer will roll her eyes at, take a moment to sanctimoniously tut about, ''but then let go''. | |||
Life, you see, is too short. She may even filter these out with her subconscious fast brain before she registers them at all. This is [[Signal-to-noise ratio|pure ''noise'']]: instinctive, formalistic fluff, well beyond a seasoned professional’s [[ditch tolerance]]. | |||
This, perhaps, explains that mysterious “alpha variance” among experienced lawyers. [[Contract review]], end of the day, is an art, not a science. Much of a contract is filler: we are better satsficing than optimising it, much less ''perfecting'' it. But where you draw the line depends on the kind of day you’re having: sometimes you take a point, sometimes you don’t. | |||
Some like the comfort of redundant boilerplate, others cannot abide it. Harbouring personal traumas and scars, different individuals — and institutions — are fearful about different things. | |||
Does it ''matter'' that your contract has a [[counterparts]] clause? Does it matter that it ''doesn’t''? | Does it ''matter'' that your contract has a [[counterparts]] clause? Does it matter that it ''doesn’t''? | ||
A busy-body [[Large language model|LLM]] that catches every blemish and cannot take a view as often creates a problem as a solution. This kind of literalness rubs off, or is beaten out of, junior lawyers as they develop. But mechanical ducks like LLMs have an insatiable thirst for it. | A busy-body [[Large language model|LLM]] that catches every blemish and cannot take a view as often creates a problem as a solution. This kind of literalness rubs off, or is beaten out of, junior lawyers as they develop. But mechanical ducks like LLMs have an insatiable thirst for it. This satisfies nothing beyond [[management information and statistics]]. | ||
For what we are fighting here is not bad ''lawyering'', nor bad ''machines'' nor bad ''intentions'' but ''bad process design''. Reinforcing it with machinery won’t help. | |||
This is the lesson of the sorcerer’s apprentice. | |||
But, as Radiant’s [[Sign Here|Alex Hamilton]] remarked to me, the converse is also true. An LLM may pick up and flag for routine non-compliance a clause which, in the context, should sound a deeper alarm. A hedge fund customer who cuts back or wordsmiths a standard [[market abuse]] representation ''[i.e., that it will never place an order when in possession of non-public price-sensitive information]'' because that ''is'' a [[bright line]] and a willingness even to approach it should be setting off klaxons across the organisation. | |||
Lost in the dreary output of a poetry-judging machine, the warning might go unheeded. | |||
====The oblique purposes of formal contracts==== | ====The oblique purposes of formal contracts==== | ||
{{Drop|T|here is one}} peculiarity that a literal approach to contract review cannot address, but we should mention: sometimes a contract’s true significance runs tangentially to its content. The forensic detail is not always the point.<ref>This is, broadly, true of all contracts, from execution until formal enforcement — and the overwhelming majority of contracts are never formally enforced.</ref> | {{Drop|T|here is one}} peculiarity that a literal approach to contract review cannot address, but we should mention: sometimes a contract’s true significance runs tangentially to its content. The forensic detail is not always the point.<ref>This is, broadly, true of all contracts, from execution until formal enforcement — and the overwhelming majority of contracts are never formally enforced.</ref> | ||
A basic example: {{plainlink|https://www.handbook.fca.org.uk/handbook/COBS/8A/1.html|European financial services regulations}} require institutions to have written contracts with all customers, as a regulatory end in itself. The rules are less prescriptive about what the contracts should ''say''. | |||
A | A firm must, therefore, have a written contract, just to do business. | ||
To meet that end, ''any'' delay in finalising that contract, in the name of “getting it right”, ought to be a source of regret. (You might be surprised how often financial firms are obliged to negotiate their [[terms of business]], in that it is not “never”.) | |||
Other contracts act as a sort of mating ritual: a performative ululation of customary cultural verities signalling that yes, we care about the same things you do, are of the right stuff, the same mind and our “ad idems” are capable of consensus. Again, it matters less what the contract says than that it is ''there''. It is a commitment signal. | |||
If it is that — most [[NDA]]s are that — then descending into an LLM-level of subterranean pedantry and exactitude, in the service of “picking up things that even a gun [[paralegal]] might not”, is a rum plan. The point is to carry out the ritual, afford it the minimum required pleasantries, but not to ''labour'' them. | If it is that — most [[NDA]]s are that — then descending into an LLM-level of subterranean pedantry and exactitude, in the service of “picking up things that even a gun [[paralegal]] might not”, is a rum plan. The point is to carry out the ritual, afford it the minimum required pleasantries, but not to ''labour'' them. | ||
====Volume contracts==== | ====Volume contracts==== | ||
{{drop|T|hose exceptions aside}}, where high-volume, low-risk legal processes do not function as courting rituals, the name of the game is not ''perfect'' negotiation, but ''no'' negotiation. | {{drop|T|hose exceptions aside}}, where high-volume, low-risk legal processes do not function as courting rituals, the name of the game is not ''perfect'' negotiation, but ''no'' negotiation. | ||
'' | This is a crucial distinction: negotiation does not fix the problem: it ''is'' the problem. | ||
If | If your customers regularly negotiate your [[terms of business]], or you get regular snarl-ups on procurement, ''you have bad forms''. ''Fix'' them. | ||
'' | This might mean persuading Legal to come to Jesus on the width of its idealised liability exclusion, or it just rewriting forms in a nicer font and plainer language — but either way, the answer is not to leave the problem where it is and just ''mechanise it''. | ||
Doing that will leave you two enduring problems: first, your portfolio of standard contracts will not be standard; secondly, your bad form is now beset with administrative machinery it will be hard, later, to take away. By appointing unskilled technocrats to manage a broken process — and, likely, ''other'' unskilled technocrats to oversee and monitor them — you have institutionalised a bad process. | |||
John Gall’s wonderful [[Systemantics]] captures this well: [[The temporary tends to become permanent|temporary fixes have a habit of becoming permanent]]. Bureaucrats are butterfly collectors: they do not give up their responsibilities without a fight. Their managers rarely have the stomach for one: ''it does a job: leave it be''. | |||
Before long, this process will have itself sedimented into the administrative sludge that weighs your organisation down. Other processes will depend on it. Surgical removal will be ''hard''. | |||
====LLMs and waste==== | |||
{{drop|L|LMs can’t function}} by, or think for, themselves (''yet''). They need looking after. Their deployment implies not saved legal cost, but “[[Seven wastes of negotiation|waste]]” transferred: what once was spent fruitlessly on [[legal eagle]]s will instead be diffused, fruitlessly, among a phalanx of [[software-as-a-service]] providers, procurement personnel, [[internal audit]] boffins, [[operations]] folk and, yes, the dear old [[legal|legal eagles]] who will ''still'' have to handle exceptions, manage and troubleshoot the system, vouch for it, be blamed for it, periodically certify that it is legally adequate to the [[Chief operating officer|COO]] and then, when it turns out not to be, explain why it wasn’t to the operational risk [[steerco]]. | |||
All of this costs money, takes time and distracts the firm’s resources from better things they could be doing. Just because it is harder to evaluate, doesn’t mean it isn’t ''there''.<ref>This is wishful thinking, of course: in a world where accounting projections are the first and last word, that ''is'' all that matters.</ref> | |||
==== | |||
[[ | ==== The finite game ==== | ||
{{Drop|B|y design, LLMs}} learn and reason exclusively from what has gone before. While, yes, [[Alpha Go|AlphaGo]] might have engineered a novel strategy in a [[zero-sum game]], the non-linear infinitude of life that a contract review process is a different kettle of fish. And this is not what we want in a sorcerer’s apprentice anyway. | |||
That being the case, the perfect LLM would be one that served up an archetypal sample of ''what you already have''. | |||
====Evolution, not revolution==== | |||
{{Quote|Gravity always wins. | |||
:—Radiohead, ''Fake Plastic Trees''}} | |||
{{Drop|J|C does not doubt}} there is a role for [[LLM]]s in legal practice. His own little chatbot, [[NiGEL]], is already treasured in the Contrarian family. He has set about all kinds of janitorial tasks around the office. He might yet be useful in redesigning forms to prevent negotiation. | |||
But it is too early, just yet, to call time on the age of the [[Das wohltemperierte Rechtsanwaltsgehilfe|well-tempered paralegal]]. Progress rambles erratically around design space, buffeted as it departs from the imperfect ''now'' by [[Brownian motion|Brownian]] forces of [[virtue signalling|fashion]], [[buttocractic oath|self-preservation]] and [[agency]] — but subject always to the immutable laws of [[entropy]]. [[Eighteenth law of worker entropy|Where there’s a will, there’s a way to make it pay]]. |
Latest revision as of 19:57, 9 February 2024
“The era of LLM dominance in legal contract review is upon us”.
Oh, just listen to the tiny, AI violins. Some legal technologists[1] have presented a “groundbreaking comparison between large language models and “traditional legal contract reviewers” — being junior lawyers and legal process outsourcers — benchmarking then against a “ground truth” set by senior lawyers.
It did not go well for the meatware.
The researchers collated and anonymised ten “real-world” procurement contracts — NDAs were deemed a bit easy — and fed them to a selection of junior bugs, LPOs and large language models.[2]
The buried lead: variance increases with experience
An interesting finding, noted but not explored by the paper, was a variance measurement across the categories of human reviewers: the least qualified, the LPOs had an “alpha” variance of 1.0, implying complete agreement among their operatives about the issues (a function, we suppose, of the mechanical obedience that LPO businesses drum into their paralegals). This dropped to 0.77 for junior lawyers and dropped further, to 0.71, for senior lawyers.[3]
You read that right: experienced lawyers were least likely to agree on what was important in a basic contract.
This says one of two things: either lawyers get worse at analysing contracts across their careers— by no means out of the question, but seeming at the very least in need of explanation — or there is something not measured in these key performance indicators that sets the veterans apart. That, maybe, linear contract analytics is the proverbial machine for judging poetry, and isn’t all there is to it.
Hold that thought.
Results: all hail the paralegals?
In any case, for accuracy the LPO paralegals did best, both in spotting issues and in locating them in the contract. (How you can spot an issue but not know where it is we are not told). Junior lawyers ranked about the same as the chatbots. Perhaps to spare their blushes the report does not say how the vets got on.
But it shouldn’t surprise anyone that all the machines were quicker than the humans, of whom LPOs were by far the slowest. There is a cost to obliging humans to behave like robots.
The clear implication: as we can expect LLMs to get better over time,[4] the meatware’s days are numbered.
Now, if you ask an experienced lawyer to craft a set of abstract guidelines that she must hand off to low-cost, rule-following units, but for whose operation she remains accountable, expect her to draw her boundaries conservatively.
There being no “bright lines” wherever there is scope for nuance or call for subtlety, she will stay well within the smudgy thresholds, not trusting a dolt — whether naturally or generatively intelligent — to get it right.
This is common sense and little more than prudent triage: well before any practical danger, her amanuenses must report back to Matron for further instruction. She can then send them back into the fray with contextualised orders, or just handle the tricky stuff herself. This is all good best-practice outsourcing, straight from the McKinsey playbook.
Now, a standard-form contract without at least one howling error is unknown to legal science, so she should expect an assiduous machine reader, so instructed, to be tireless, and quickly tiresome, in rooting out formal discrepancies and bringing them to her attention.
Variance, redux: when “solution” is the problem
Contrary to modernist wisdom — viz., thou shalt not rest until all problems are solved — descending the fractal tunnel of error is, sometimes, a bad idea. Usually, in fact. Down it are snafus and boo-boos that an experienced lawyer will roll her eyes at, take a moment to sanctimoniously tut about, but then let go.
Life, you see, is too short. She may even filter these out with her subconscious fast brain before she registers them at all. This is pure noise: instinctive, formalistic fluff, well beyond a seasoned professional’s ditch tolerance.
This, perhaps, explains that mysterious “alpha variance” among experienced lawyers. Contract review, end of the day, is an art, not a science. Much of a contract is filler: we are better satsficing than optimising it, much less perfecting it. But where you draw the line depends on the kind of day you’re having: sometimes you take a point, sometimes you don’t.
Some like the comfort of redundant boilerplate, others cannot abide it. Harbouring personal traumas and scars, different individuals — and institutions — are fearful about different things.
Does it matter that your contract has a counterparts clause? Does it matter that it doesn’t?
A busy-body LLM that catches every blemish and cannot take a view as often creates a problem as a solution. This kind of literalness rubs off, or is beaten out of, junior lawyers as they develop. But mechanical ducks like LLMs have an insatiable thirst for it. This satisfies nothing beyond management information and statistics.
For what we are fighting here is not bad lawyering, nor bad machines nor bad intentions but bad process design. Reinforcing it with machinery won’t help.
This is the lesson of the sorcerer’s apprentice.
But, as Radiant’s Alex Hamilton remarked to me, the converse is also true. An LLM may pick up and flag for routine non-compliance a clause which, in the context, should sound a deeper alarm. A hedge fund customer who cuts back or wordsmiths a standard market abuse representation [i.e., that it will never place an order when in possession of non-public price-sensitive information] because that is a bright line and a willingness even to approach it should be setting off klaxons across the organisation.
Lost in the dreary output of a poetry-judging machine, the warning might go unheeded.
The oblique purposes of formal contracts
There is one peculiarity that a literal approach to contract review cannot address, but we should mention: sometimes a contract’s true significance runs tangentially to its content. The forensic detail is not always the point.[6]
A basic example: European financial services regulations require institutions to have written contracts with all customers, as a regulatory end in itself. The rules are less prescriptive about what the contracts should say.
A firm must, therefore, have a written contract, just to do business.
To meet that end, any delay in finalising that contract, in the name of “getting it right”, ought to be a source of regret. (You might be surprised how often financial firms are obliged to negotiate their terms of business, in that it is not “never”.)
Other contracts act as a sort of mating ritual: a performative ululation of customary cultural verities signalling that yes, we care about the same things you do, are of the right stuff, the same mind and our “ad idems” are capable of consensus. Again, it matters less what the contract says than that it is there. It is a commitment signal.
If it is that — most NDAs are that — then descending into an LLM-level of subterranean pedantry and exactitude, in the service of “picking up things that even a gun paralegal might not”, is a rum plan. The point is to carry out the ritual, afford it the minimum required pleasantries, but not to labour them.
Volume contracts
Those exceptions aside, where high-volume, low-risk legal processes do not function as courting rituals, the name of the game is not perfect negotiation, but no negotiation.
This is a crucial distinction: negotiation does not fix the problem: it is the problem.
If your customers regularly negotiate your terms of business, or you get regular snarl-ups on procurement, you have bad forms. Fix them.
This might mean persuading Legal to come to Jesus on the width of its idealised liability exclusion, or it just rewriting forms in a nicer font and plainer language — but either way, the answer is not to leave the problem where it is and just mechanise it.
Doing that will leave you two enduring problems: first, your portfolio of standard contracts will not be standard; secondly, your bad form is now beset with administrative machinery it will be hard, later, to take away. By appointing unskilled technocrats to manage a broken process — and, likely, other unskilled technocrats to oversee and monitor them — you have institutionalised a bad process.
John Gall’s wonderful Systemantics captures this well: temporary fixes have a habit of becoming permanent. Bureaucrats are butterfly collectors: they do not give up their responsibilities without a fight. Their managers rarely have the stomach for one: it does a job: leave it be.
Before long, this process will have itself sedimented into the administrative sludge that weighs your organisation down. Other processes will depend on it. Surgical removal will be hard.
LLMs and waste
LLMs can’t function by, or think for, themselves (yet). They need looking after. Their deployment implies not saved legal cost, but “waste” transferred: what once was spent fruitlessly on legal eagles will instead be diffused, fruitlessly, among a phalanx of software-as-a-service providers, procurement personnel, internal audit boffins, operations folk and, yes, the dear old legal eagles who will still have to handle exceptions, manage and troubleshoot the system, vouch for it, be blamed for it, periodically certify that it is legally adequate to the COO and then, when it turns out not to be, explain why it wasn’t to the operational risk steerco.
All of this costs money, takes time and distracts the firm’s resources from better things they could be doing. Just because it is harder to evaluate, doesn’t mean it isn’t there.[7]
The finite game
By design, LLMs learn and reason exclusively from what has gone before. While, yes, AlphaGo might have engineered a novel strategy in a zero-sum game, the non-linear infinitude of life that a contract review process is a different kettle of fish. And this is not what we want in a sorcerer’s apprentice anyway.
That being the case, the perfect LLM would be one that served up an archetypal sample of what you already have.
Evolution, not revolution
Gravity always wins.
- —Radiohead, Fake Plastic Trees
JC does not doubt there is a role for LLMs in legal practice. His own little chatbot, NiGEL, is already treasured in the Contrarian family. He has set about all kinds of janitorial tasks around the office. He might yet be useful in redesigning forms to prevent negotiation.
But it is too early, just yet, to call time on the age of the well-tempered paralegal. Progress rambles erratically around design space, buffeted as it departs from the imperfect now by Brownian forces of fashion, self-preservation and agency — but subject always to the immutable laws of entropy. Where there’s a will, there’s a way to make it pay.
- ↑ At the Onit Inc. “AI Center of Excellence” in Auckland, New Zealand.
- ↑ It looks to have been those of OpenAI, Google, Anthropic, Amazon and Meta. Poor old Bing didn’t get a look in.
- ↑ “Cronbach’s alpha” is a statistic that measures internal consistency and reliability, of a different items such as, in this case, the legal agreement reviews. A high “alpha” indicates consistency and general agreement between individual reviewers; a low alpha indicates variance or disagreement.
- ↑ Maybe not, actually, but okay.
- ↑ This is a nerd’s version of the drummer joke: What’s the difference between a drummer and a drum machine? You only have to punch information into a drum machine once.
- ↑ This is, broadly, true of all contracts, from execution until formal enforcement — and the overwhelming majority of contracts are never formally enforced.
- ↑ This is wishful thinking, of course: in a world where accounting projections are the first and last word, that is all that matters.