Template:M intro technology Better call ChatGPT: Difference between revisions

no edit summary
(Created page with "{{quote|“The era of LLM dominance in legal contract review is upon us”.}} Oh, just listen to the tiny, AI violins. Some legal technologists<ref>At the Onit Inc. “AI Center of Excellence” in Auckland, New Zealand.</ref> have presented a “{{plainlink|https://arxiv.org/html/2401.16212v1|groundbreaking comparison between large language models and “traditional legal contract reviewers}}” — being junior lawyers and legal process outsourcers — benchmarking t...")
 
No edit summary
Line 1: Line 1:
{{quote|“The era of LLM dominance in legal contract review is upon us”.}}
{{quote|“The era of LLM dominance in legal contract review is upon us”.}}
Oh, just listen to the tiny, AI violins.
{{drop|O|h, just listen}} to the tiny, AI violins. Some legal technologists<ref>At the Onit Inc. “AI Center of Excellence” in Auckland, New Zealand.</ref>  have presented a “{{plainlink|https://arxiv.org/html/2401.16212v1|groundbreaking comparison between large language models and “traditional legal contract reviewers}}” — being junior lawyers and legal process outsourcers — benchmarking then against a “ground truth” set by senior lawyers.  


Some legal technologists<ref>At the Onit Inc. “AI Center of Excellence” in Auckland, New Zealand.</ref>  have presented a “{{plainlink|https://arxiv.org/html/2401.16212v1|groundbreaking comparison between large language models and “traditional legal contract reviewers}}” — being junior lawyers and legal process outsourcers — benchmarking then against a “ground truth” set by senior lawyers.  
It did not go well for the meatware.


It did not go well for the meatware.
The researchers collated and anonymised ten “real-world” procurement contracts — [[NDA]]<nowiki/>s were [[deemed]] a bit easy — and fed them to a selection of junior bugs, [[Legal process outsourcer|LPO]]s and [[Large language model|large language models]].<ref>It looks to have been those of OpenAI, Google, Anthropic, Amazon and Meta. Poor old Bing didn’t get a look in.</ref>


The researchers collated and anonymised ten “real-world” procurement contracts — NDAs were seen to be a bit easy — and fed them to a selection of junior bugs, [[Legal process outsourcer|LPO]]s and [[large language models]].<ref>It looks to have been those of OpenAI, Google, Anthropic, Amazon and Meta. Poor old Bing didn’t get a look in.</ref>
===== Variance ''increases'' with experience =====
{{drop|A|n interesting finding,}} noted but not explored by the paper, was a variance measurement<ref>“Cronbach’s alpha” is a statistic that measures internal consistency and reliability, of a different items such as, in this case, the legal agreement reviews. A high “alpha” indicates consistency and general agreement; a low alpha indicates variance or disagreement.</ref> across the categories of human reviewers: the ''least'' qualified, the LPOs had an “alpha” variance of 1.0, implying complete agreement among them about the issues (a function, we suppose, of slavish and obedient adherence that is beaten into LPO businesses). This ''dropped'' to 0.77 for junior lawyers and further still to 0.71 for senior lawyers.


An interesting finding, noted but not deeply explored by the paper, was a variance measurement<ref>“Cronbach’s alpha” is a statistic that measures internal consistency and reliability, of a different items such as, in this case, the legal agreement reviews. A high “alpha” indicates consistency and general agreement; a low alpha indicates variance or disagreement.</ref> across the categories of human reviewers: the ''least'' qualified, the LPOs had an “alpha” of 1, implying complete agreement about the issues (a function, we suppose, of slavish and obedient adherence to the rules). This ''dropped'' to 0.77 for junior lawyers and further still to 0.71 for senior lawyers.
You read that right: experienced lawyers were ''least'' likely to agree what was important in a basic contract.  


You read that right: ''senior'' lawyers were ''less'' consistent about what was important in a basic contract than anyone else.  
This says one of two things: either you get worse at reading contracts as you get more experienced, or there is something else, not measured in these [[key performance indicator]]<nowiki/>s, that sets the veterans apart. That, maybe, linear contract analytics isn’t all there is to it.  


This says one of two things: either you get worse at reading contracts as you get more experienced, or there is something else, not measured in these key performance indicators, that sets apart an experienced lawyer from, well, a machine. That reading basic contracts isn’t all there is to it. Hold that thought.
Hold that thought.


In any case, perhaps to spare their blushes, the report does not tell us how senior lawyers did compared with the machines, but the good news — for [[paralegal]]<nowiki/>s— is that they came out better than any of the chatbots, both when spotting issues and when locating them in the contract (how you can ''spot'' an issue but not know where it is, we are not told) whereas junior lawyers were about the ''same'' as the machine.
====Results: all hail the paralegals?====
{{drop|I|n any case}} — perhaps to spare their blushes the report does not tell us how the vets did compared with the chatbots, but the LPO [[paralegal]]s came out best, both in spotting issues and locating them in the contract (how you can ''spot'' an issue but not know where it is, we are not told). Junior lawyers ranked about the ''same'' as the machines.  


It won’t be a surprise to hear that the machines were a lot quicker. Clear implication: we can expect LLMs to get better over time,<ref>Maybe {{Plainlink|https://www.theregister.com/2023/07/20/gpt4_chatgpt_performance/|not, actually}}, but okay.</ref>  so the meatware’s days are numbered.  
But it shouldn’t surprise anyone that all the machines were a lot quicker that any of the humans. LPOs were the slowest.  


There is something to admire in the method: deferring to the already-credentialised acknowledges the power structure in which we dance the legal tarantella. This is the paradigm: it throws out not only what is a good answer, but what counts as a good question.  
Clear implication: we can expect LLMs to get better over time,<ref>Maybe {{Plainlink|https://www.theregister.com/2023/07/20/gpt4_chatgpt_performance/|not, actually}}, but okay.</ref> so the [[meatware]]’s days are numbered.  


Senior lawyers make the rules.  
Now, if you ask a senior lawyer to craft a set of abstract guidelines that she must hand off to low-cost, rule-following units she will draw her boundaries ''conservatively''. Wherever there is scope for nuance or subtle judgment, she veer inside them, not trusting a dolt — whether naturally or generatively intelligent — to get it right.  


Now, if you ask a senior lawyer to craft a set of abstract guidelines that she must hand off to low-cost, rule-following units bear in mind where she will draw her boundaries: ''conservatively''. Wherever there be scope for nuance or subtle judgment, she will not trust a dolt — whether naturally or generatively intelligent — to get it right.  
This is basic triage: well before any practical danger her amanuenses must report to matron for further instruction, whereupon our responsible lawyer can send the machine back into the fray with new instructions, or handle anything properly tricky herself. This is all good best-practice outsourcing, right from the playbook.


Well inside the practical point of danger she will instruct her amanuenses to return home for instruction, preferring to handle exceptions that commit anything like an edge-case to a sorcerer’s apprentice.
Now, a standard-form contract without at least one howling error is unknown to legal science, so she should expect an assiduous machine reader, so instructed, to be tireless, and quickly tiresome, in rooting out minor discrepancies.  


Now, a standard-form contract without at least one howling error is unknown to legal science, so you should expect an assiduous machine reader, so instructed, to be tireless, and quickly tiresome, in rooting out minor discrepancies and exceptions. This, for the most part, is a bad thing. These are all things an experienced lawyer would roll her eyes at, or sanctimoniously tut about, ''but then let go'', without recording for posterity that the issue was even there.
This, for the most part, is a bad thing. These are all things an experienced lawyer would roll her eyes at, or sanctimoniously tut about, ''but then let go'', in most cases without even recording that the “issue” was even there. This is formalistic fluff, a long way past a seasoned professional’s [[ditch tolerance]].


These judgment calls, we submit, account for the increasing variance among experienced lawyers. Contract review, end of the day, is an art, not a science.
This, perhaps, accounts for that mysterious variance among experienced lawyers. Contract review, end of the day, is an art, not a science. Sometimes you take a point, sometimes you don’t.


A busy-body LLM that gets everything right, which cannot take a view, gives her masters a problem: they have an officious pedant on their hands. This kind of pedantry wears out of junior lawyers. LLMs have an insatiable thirst for it.
A busy-body LLM that gets everything right, which cannot take a view, gives her masters a problem: they have an officious pedant on their hands. This kind of pedantry wears out of junior lawyers. LLMs have an insatiable thirst for it.