Template:M intro technology Better call ChatGPT: Difference between revisions

From The Jolly Contrarian
Jump to navigation Jump to search
No edit summary
No edit summary
Tags: Mobile edit Mobile web edit
Line 6: Line 6:
The researchers collated and anonymised ten “real-world” procurement contracts — [[NDA]]<nowiki/>s were [[deemed]] a bit easy — and fed them to a selection of junior bugs, [[Legal process outsourcer|LPO]]s and [[Large language model|large language models]].<ref>It looks to have been those of OpenAI, Google, Anthropic, Amazon and Meta. Poor old Bing didn’t get a look in.</ref>
The researchers collated and anonymised ten “real-world” procurement contracts — [[NDA]]<nowiki/>s were [[deemed]] a bit easy — and fed them to a selection of junior bugs, [[Legal process outsourcer|LPO]]s and [[Large language model|large language models]].<ref>It looks to have been those of OpenAI, Google, Anthropic, Amazon and Meta. Poor old Bing didn’t get a look in.</ref>


===== Variance ''increases'' with experience =====
===== The buried lead: variance ''increases'' with experience =====
{{drop|A|n interesting finding,}} noted but not explored by the paper, was a variance measurement<ref>“Cronbach’s alpha” is a statistic that measures internal consistency and reliability, of a different items such as, in this case, the legal agreement reviews. A high “alpha” indicates consistency and general agreement; a low alpha indicates variance or disagreement.</ref> across the categories of human reviewers: the ''least'' qualified, the LPOs had an “alpha” variance of 1.0, implying complete agreement among them about the issues (a function, we suppose, of slavish and obedient adherence that is beaten into LPO businesses). This ''dropped'' to 0.77 for junior lawyers and further still to 0.71 for senior lawyers.
{{drop|A|n interesting finding,}} noted but not explored by the paper, was a variance measurement<ref>“Cronbach’s alpha” is a statistic that measures internal consistency and reliability, of a different items such as, in this case, the legal agreement reviews. A high “alpha” indicates consistency and general agreement; a low alpha indicates variance or disagreement.</ref> across the categories of human reviewers: the ''least'' qualified, the [[LPO]]s had an “alpha” variance of 1.0, implying complete agreement among them about the issues (a function, we suppose, of slavish and obedient adherence that is beaten into LPO businesses). This ''dropped'' to 0.77 for junior lawyers and further still to 0.71 for senior lawyers.


You read that right: experienced lawyers were ''least'' likely to agree what was important in a basic contract.  
You read that right: experienced lawyers were ''least'' likely to agree what was important in a basic contract.  


This says one of two things: either you get worse at reading contracts as you get more experienced, or there is something else, not measured in these [[key performance indicator]]<nowiki/>s, that sets the veterans apart. That, maybe, linear contract analytics isn’t all there is to it.  
This says one of two things: either lawyers get worse at reading contracts as they get more experienced — by no means out of the question, and would explain a few things — or there is something not measured in these [[key performance indicator]]s that sets the veterans apart. That, maybe, linear contract analytics is the proverbial a [[machine for judging poetry]], and  isn’t all there is to it.  


Hold that thought.
Hold that thought.


====Results: all hail the paralegals?====
====Results: all hail the paralegals?====
{{drop|I|n any case}} — perhaps to spare their blushes — the report does not tell us how the vets did compared with the chatbots, but the LPO [[paralegal]]s came out best, both in spotting issues and locating them in the contract (how you can ''spot'' an issue but not know where it is, we are not told). Junior lawyers ranked about the ''same'' as the machines.  
{{drop|I|n any case}}, for accuracy the LPO [[paralegal]]s did best, both in spotting issues and in locating them in the contract. (How you can ''spot'' an issue but not know where it is we are not told). Junior lawyers ranked about the ''same'' as the chatbots. Perhaps to spare their blushes the report does not say how the vets got on.


But it shouldn’t surprise anyone that all the machines were a lot quicker that any of the humans. LPOs were the slowest.  
But it shouldn’t surprise anyone that all the machines were quicker that the humans of whom. LPOs were by far the slowest. There is a cost to obliging humans to behave like robots.


Clear implication: we can expect LLMs to get better over time,<ref>Maybe {{Plainlink|https://www.theregister.com/2023/07/20/gpt4_chatgpt_performance/|not, actually}}, but okay.</ref> so the [[meatware]]’s days are numbered.  
Clear implication: as we can expect [[LLM]]s to get better over time,<ref>Maybe {{Plainlink|https://www.theregister.com/2023/07/20/gpt4_chatgpt_performance/|not, actually}}, but okay.</ref> the [[meatware]]’s days are numbered.  


Now, if you ask a senior lawyer to craft a set of abstract guidelines that she must hand off to low-cost, rule-following units she will draw her boundaries ''conservatively''. Wherever there is scope for nuance or subtle judgment, she veer inside them, not trusting a dolt — whether naturally or generatively intelligent — to get it right.  
Now, if you ask an experienced lawyer to craft a set of abstract guidelines that she must hand off to low-cost, rule-following units, but for whose operation she remains accountable, expect her to draw her boundaries ''conservatively''.  


This is basic triage: well before any practical danger her amanuenses must report to matron for further instruction, whereupon our responsible lawyer can send the machine back into the fray with new instructions, or handle anything properly tricky herself. This is all good best-practice outsourcing, right from the playbook.
There being no “[[bright line test|bright lines]]” wherever there is scope for nuance or call for subtlety, she will stay well inside it, not trusting a dolt — whether naturally or generatively intelligent — to get it right.  


Now, a standard-form contract without at least one howling error is unknown to legal science, so she should expect an assiduous machine reader, so instructed, to be tireless, and quickly tiresome, in rooting out minor discrepancies.  
This is common sense and little more than prudent  [[triage]]: well before any practical danger, her amanuenses must report to matron for further instruction. She can then send the machine back into the fray with contextualised instructions, or just handle anything properly tricky herself. This is all good best-practice outsourcing, straight from the McKinsey [[playbook]].


This, for the most part, is a bad thing. These are all things an experienced lawyer would roll her eyes at, or sanctimoniously tut about, ''but then let go'', in most cases without even recording that the “issue” was even there. This is formalistic fluff, a long way past a seasoned professional’s [[ditch tolerance]].
Now, a standard-form contract without at least one howling error is unknown to legal science, so she should expect an assiduous machine reader, so instructed, to be tireless, and quickly tiresome, in rooting out formal discrepancies and bringing them to her attention.  


This, perhaps, accounts for that mysterious variance among experienced lawyers. Contract review, end of the day, is an art, not a science. Sometimes you take a point, sometimes you don’t.
Contrary to modernist wisdom, this, for the most part, is a bad thing. These will be things an experienced lawyer would roll her eyes at, or sanctimoniously tut about, ''but then let go'', in most cases without even recording that the “issue” was even there. This is formalistic fluff, a long way past a seasoned professional’s [[ditch tolerance]].  


A busy-body LLM that gets everything right, which cannot take a view, gives her masters a problem: they have an officious pedant on their hands. This kind of pedantry wears out of junior lawyers. LLMs have an insatiable thirst for it.
This, perhaps, accounts for that mysterious variance among experienced lawyers. Contract review, end of the day, is an art, not a science. Sometimes you take a point, sometimes you don’t. Some lawyers like the comfort of redundant boilerplate, others cannot abide it. Harbouring different scars, different institutions are fearful about different things. Does it matter that your contract has a [[counterparts]] clause? Does it matter that it ''doesn’t''?


For what we are fighting against here is neither bad lawyering, nor bad machines nor bad intentions but ''bad design and bad process''. Insatiable busybodies do not make things better. A bad process that you make faster, more accurate and more consistent remains a bad process. This is the lesson of the sorcerer’s apprentice.
A busy-body LLM that sees everything and cannot take a view gives its master a problem: she has an officious pedant on her hands. This kind of pedantry usually rubs off as junior lawyers acquire experience. LLMs have an insatiable thirst for it.


We are going to make some assumptions here. The contacts in question are suitable for outsourcing to an LPO which means they are relatively high in volume, relatively low in risk and value, relatively (but not entirely) simple in structure and their rules for processing them are relatively easy to state. This is stuff you can hard off with clear instructions to a paralegal and not much can do wrong.
For what we are fighting here is not bad lawyering, nor bad machines nor bad intentions but ''bad process design''. Supporting it with machinery will make things worse. This is the lesson of the sorcerer’s apprentice.


There is one archetypal contract that meets all these criteria: an NDA. Some complexity, some regulatory sensitivity (privacy and data protection), but the rules of the road are widely understood. It is hard to get into an intractable conceptual argument  in an NDA, but passably diverting to try.
====The oblique purposes of formal contracts====


There is one peculiarity that a technologist’s formalistic approach cannot address, but we should mention: sometimes a contract’s true significance is tangential to its contents. Sometimes the finely thrashed-out detail doesn’t really matter. Sometimes the act of finely thrashing out unimportant details frustrates the true purpose of the contract, which is to fulfil a sociological function. As a commitment or competence signal.  
There is one peculiarity that this kind of formalistic approach cannot address, but we should mention: sometimes a contract’s true significance is tangential to its contents. Sometimes the finely thrashed-out detail is not the point.<ref>This is, broadly, true of all contracts from execution until formal enforcement. The overwhelming majority of contracts are never formally enforced.</ref>
 
Sometimes the very ''act'' of finely thrashing out unimportant details frustrates the true purpose of the contract, which is to fulfil a sociological function. As a commitment signal or competence signal.  


As a mating ritual, of sorts: a performative ululation of customary cultural verities meant signal that yes, we care about the same things you do, are of the right stuff, the same mind and our ad idems are capable of consensus.  
As a mating ritual, of sorts: a performative ululation of customary cultural verities meant signal that yes, we care about the same things you do, are of the right stuff, the same mind and our ad idems are capable of consensus.  

Revision as of 09:02, 8 February 2024

“The era of LLM dominance in legal contract review is upon us”.

Oh, just listen to the tiny, AI violins. Some legal technologists[1] have presented a “groundbreaking comparison between large language models and “traditional legal contract reviewers” — being junior lawyers and legal process outsourcers — benchmarking then against a “ground truth” set by senior lawyers.

It did not go well for the meatware.

The researchers collated and anonymised ten “real-world” procurement contracts — NDAs were deemed a bit easy — and fed them to a selection of junior bugs, LPOs and large language models.[2]

The buried lead: variance increases with experience

An interesting finding, noted but not explored by the paper, was a variance measurement[3] across the categories of human reviewers: the least qualified, the LPOs had an “alpha” variance of 1.0, implying complete agreement among them about the issues (a function, we suppose, of slavish and obedient adherence that is beaten into LPO businesses). This dropped to 0.77 for junior lawyers and further still to 0.71 for senior lawyers.

You read that right: experienced lawyers were least likely to agree what was important in a basic contract.

This says one of two things: either lawyers get worse at reading contracts as they get more experienced — by no means out of the question, and would explain a few things — or there is something not measured in these key performance indicators that sets the veterans apart. That, maybe, linear contract analytics is the proverbial a machine for judging poetry, and isn’t all there is to it.

Hold that thought.

Results: all hail the paralegals?

In any case, for accuracy the LPO paralegals did best, both in spotting issues and in locating them in the contract. (How you can spot an issue but not know where it is we are not told). Junior lawyers ranked about the same as the chatbots. Perhaps to spare their blushes the report does not say how the vets got on.

But it shouldn’t surprise anyone that all the machines were quicker that the humans of whom. LPOs were by far the slowest. There is a cost to obliging humans to behave like robots.

Clear implication: as we can expect LLMs to get better over time,[4] the meatware’s days are numbered.

Now, if you ask an experienced lawyer to craft a set of abstract guidelines that she must hand off to low-cost, rule-following units, but for whose operation she remains accountable, expect her to draw her boundaries conservatively.

There being no “bright lines” wherever there is scope for nuance or call for subtlety, she will stay well inside it, not trusting a dolt — whether naturally or generatively intelligent — to get it right.

This is common sense and little more than prudent triage: well before any practical danger, her amanuenses must report to matron for further instruction. She can then send the machine back into the fray with contextualised instructions, or just handle anything properly tricky herself. This is all good best-practice outsourcing, straight from the McKinsey playbook.

Now, a standard-form contract without at least one howling error is unknown to legal science, so she should expect an assiduous machine reader, so instructed, to be tireless, and quickly tiresome, in rooting out formal discrepancies and bringing them to her attention.

Contrary to modernist wisdom, this, for the most part, is a bad thing. These will be things an experienced lawyer would roll her eyes at, or sanctimoniously tut about, but then let go, in most cases without even recording that the “issue” was even there. This is formalistic fluff, a long way past a seasoned professional’s ditch tolerance.

This, perhaps, accounts for that mysterious variance among experienced lawyers. Contract review, end of the day, is an art, not a science. Sometimes you take a point, sometimes you don’t. Some lawyers like the comfort of redundant boilerplate, others cannot abide it. Harbouring different scars, different institutions are fearful about different things. Does it matter that your contract has a counterparts clause? Does it matter that it doesn’t?

A busy-body LLM that sees everything and cannot take a view gives its master a problem: she has an officious pedant on her hands. This kind of pedantry usually rubs off as junior lawyers acquire experience. LLMs have an insatiable thirst for it.

For what we are fighting here is not bad lawyering, nor bad machines nor bad intentions but bad process design. Supporting it with machinery will make things worse. This is the lesson of the sorcerer’s apprentice.

The oblique purposes of formal contracts

There is one peculiarity that this kind of formalistic approach cannot address, but we should mention: sometimes a contract’s true significance is tangential to its contents. Sometimes the finely thrashed-out detail is not the point.[5]

Sometimes the very act of finely thrashing out unimportant details frustrates the true purpose of the contract, which is to fulfil a sociological function. As a commitment signal or competence signal.

As a mating ritual, of sorts: a performative ululation of customary cultural verities meant signal that yes, we care about the same things you do, are of the right stuff, the same mind and our ad idems are capable of consensus.

If it is that — most NDAs are that — then descending into the subterranean world of pedantry and exactitude that an LLM offers, in the service of “picking up things that even a trained paralegal might not” can be even counterproductive. The point is to carry out the ritual; accord these pleasantries the required respect but to not labour them.

Now. That aside: with high-volume, low-risk legal processes — especially where they do not play a part in the courting rites — the name of the game is not fast, efficient and precise negotiation, but no negotiation. Negotiation is the problem. If you find customers regularly negotiate your standard terms of business, or you get regular snarl-ups on procurement processes and end-user sale contracts you have bad contracts. Fix them.

This might be a matter of formal redesign, or persuading legal to come to Jesus on the preposterous width of the exclusion of liability and indemnity — but the answer is not to excellently negotiate individual contracts. This leaves you with two enduring problems: first, your portfolio of homogenous customer sale contracts are not homogenous; secondly, you have now overlaid administrative machinery upon a bad process — that generates non-standard contracts — that will be hard to remove. By appointing unskilled bureaucrats and technocrats to oversee and manage that process, and likely other unskilled bureaucrats to oversee and monitor them , you have institutionalised bad process.

John Gall’s Systemantics captures this well. Temporary fixes have a habit of becoming permanent. Bureaucrats are butterfly collectors: they do not give up responsibilities without a fight. Before long, this process will have itself sedimented into the administrative sludge that weighs your organisation down.

LLMs can’t function by themselves (yet: we are not quite at the point of skynet, however much techno-utopians might hanker for it). They imply not saved legal cost, but “waste” transferred: it will be diffused among software-as-a-service providers, the firm’s procurement complex, internal audit, operations and, yes, legal who will still have to handle exceptions, manage and troubleshoot the system, vouch for it, periodically certify its legal adequacy and present it to the opco

LLMs as finite. They necessarily mimic what has gone before. While yes alpha go might engineer a novel strategy in a zero sum game, it is not so easy on in the non linear infinitude of life. An LLM that purports to improve on its training material will be distrusted it doesn’t understand, so what good had it got of reimagining?

The perfect LLM serves up an archetypal sample of what you already have.

  1. At the Onit Inc. “AI Center of Excellence” in Auckland, New Zealand.
  2. It looks to have been those of OpenAI, Google, Anthropic, Amazon and Meta. Poor old Bing didn’t get a look in.
  3. “Cronbach’s alpha” is a statistic that measures internal consistency and reliability, of a different items such as, in this case, the legal agreement reviews. A high “alpha” indicates consistency and general agreement; a low alpha indicates variance or disagreement.
  4. Maybe not, actually, but okay.
  5. This is, broadly, true of all contracts from execution until formal enforcement. The overwhelming majority of contracts are never formally enforced.