Over the past few days, senior judges at opposite ends of the world have issued guidance to their judicial colleagues on the use of artificial intelligence. But if the sad case of Felicity Harber is anything to go by, it’s the litigants who need to be told that AI can simply make things up.
All I can find out about Harber is that in 2018 she sold a house in “Sunderstead Road” —possibly a misspelling — which she had previously let to tenants. She failed to declare her liability to capital gains tax and was issued with a penalty of £3265.
Harber appealed against the penalty on the basis that she had a reasonable excuse — two, in fact. First, she had suffered from anxiety and panic attacks. The first-tier tax tribunal decided this month that her mental health condition did not amount to a reasonable excuse. Secondly, she claimed ignorance of the law. Given the evidence, the tribunal dismissed this argument too.
If that had been all, the ruling would never have been picked up by the Legal Futures website last week. But Harber had chosen to buttress her arguments with summaries of nine tribunal decisions in which she said a defence of reasonable excuse had been successful — five relating to mental health problems and four involving ignorance of the law.
The problem was that none of the cases turned out to be real. As the tribunal says in its judgment, once it became clear that nobody could find the decisions she had cited, she was asked
if the cases had been generated by an AI system, such as ChatGPT. Mrs Harber said this was “possible”, but moved quickly on to say that she couldn’t see that it made any difference as there must have been other first-tier tribunal cases in which the tribunal had decided that a person’s ignorance of the law and/or mental health condition provided a reasonable excuse.
She then asked the tribunal members how anyone could be confident that the cases produced by HM Revenue and Customs were genuine. Because, the tribunal explained patiently, they were reported in full on public websites. Harber didn’t know that such things existed.
Fortunately, the tribunal had spotted some tell-tale signs in the summaries she had provided, such as US spellings and the frequent repetition of identical phrases. And, although Harber was not ignorant of the law, she had no idea that the cases she found had been made up.
But Tribunal Judge Anne Redston and her colleague Helen Myerscough were not amused:
We acknowledge that providing fictitious cases in reasonable excuse tax appeals is likely to have less impact on the outcome than in many other types of litigation, both because the law on reasonable excuse is well settled and because the task of a tribunal is to consider how that law applies to the particular facts of each appellant’s case.
But that does not mean that citing invented judgments is harmless. It causes the tribunal and HM Revenue and Customs to waste time and public money; and this reduces the resources available to progress the cases of other court users who are waiting for their appeals to be determined.
New Zealand
Last Thursday, the courts of New Zealand issued guidelines to their judiciary on the use of generative artificial intelligence in courts and tribunals. I’ll begin by quoting advice that might come in useful if Harber turns out to have any distant cousins:
Lay litigants may use GenAI [generative artificial intelligence] chatbots to identify and explain relevant laws and legal principles or to prepare basic legal documents. However, they will often not have the skills to independently verify the legal information provided and may not be aware that it is prone to error. It may be appropriate to inquire whether a lay litigant has used a GenAI chatbot, and to ask what checks for accuracy they have undertaken (if any).
“Red flags” that may indicate a GenAI chatbot has been used include:
submissions that use American spelling or refer to overseas cases;
content that (superficially at least) appears to be highly persuasive and well-written, but on closer inspection contains obvious substantive errors;
references to cases that do not sound familiar;
parties citing different bodies of case law in relation to the same legal issues; and
submissions that do not accord with your general understanding of the law in the area.
What if judges in New Zealand want to use AI themselves? Understand its limitations, they are told:
GenAI chatbots are not search engines. They do not provide answers from authoritative databases but, rather, generate new text using a complex algorithm based on the prompts they receive and the data they have been “trained” on. This means the output generated by a GenAI chatbot is what it predicts to be the most likely combination of words (based on the documents and data that it holds as source information). However, even if the output looks convincing, it may not be factually correct.
The currently available GenAI chatbots appear to have had limited access to training data on New Zealand law or on the procedural requirements that apply in New Zealand courts and tribunals.
The quality of any answers you receive will depend on how the GenAI chatbot has been trained, the reliability of the training data, and how you engage with the relevant GenAI chatbot, including the “quality” of the prompts you enter.
Even with the best prompts, the output may be inaccurate, incomplete, misleading or biased.
And there is more advice under the following headings:
Uphold confidentiality, suppression and privacy
Ensure accountability and accuracy
Be aware of ethical issues
Maintain security
Disclosing GenAI use
England and Wales
By a spooky coincidence, the courts of England and Wales issued their own guidance on artificial intelligence yesterday. “Judges do not need to shun the careful use of AI,” they were told by Sir Geoffrey Vos, head of civil justice. “But they must ensure that they protect confidence and take full personal responsibility for everything they produce.”
Again, holders of judicial office are told they must understand the limitations of artificial intelligence:
Public AI chatbots do not provide answers from authoritative databases. They generate new text using an algorithm based on the prompts they receive and the data they have been trained upon. This means the output which AI chatbots generate is what the model predicts to be the most likely combination of words (based on the documents and data that it holds as source information). It is not necessarily the most accurate answer.
As with any other information available on the internet in general, AI tools may be useful to find material you would recognise as correct but have not got to hand, but are a poor way of conducting research to find new information you cannot verify. They may be best seen as a way of obtaining non-definitive confirmation of something rather than providing immediately correct facts.
The quality of any answers you receive will depend on how you engage with the relevant AI tool, including the nature of the prompts you enter. Even with the best prompts, the information provided may be inaccurate, incomplete, misleading or biased.
The currently available “large language models” appear to have been trained on material published on the internet. Their “view” of the law is often based heavily on US law although some do purport to be able to distinguish between that and English law.
As in New Zealand, judges are warned about confidentiality:
Do not enter any information into a public AI chatbot that is not already in the public domain. Do not enter information which is private or confidential. Any information that you input into a public AI chatbot should be seen as being published to all the world.
The current publicly available AI chatbots remember every question that you ask them, as well as any other information you put into them. That information is then available to be used to respond to queries from other users. As a result, anything you type into it could become publicly known.
Then there’s a passage about accountability and accuracy that is similar to the advice offered in New Zealand:
The accuracy of any information you have been provided by an AI tool must be checked before it is used or relied upon.
Information provided by AI tools may be inaccurate, incomplete, misleading or out of date. Even if it purports to represent English law, it may not do so.
The advice offered on bias — information generated by AI will inevitably reflect errors and biases in its training data — is almost identical to what New Zealand judges are told. So is the guidance on maintaining security.
On disclosing their use of AI, judges in England and Wales are reminded they are not generally obliged to describe the research or preparatory work that goes into a judgment. Provided they follow the new guidelines, “there is no reason why generative AI could not be a potentially useful secondary tool”.
That’s similar to the guidance in New Zealand. And, like their counterparts, judges in England and Wales are advised about unrepresented litigants who may not realise that chatbots are prone to error.
Hallucinating
“Generative AI chatbots are known to produce false information that may appear true,” New Zealand judges are told. “This is called ‘hallucinating’.” I began to wonder whether I was hallucinating when I saw how similar the guidance for New Zealand judges was to the guidance for judges in England and Wales.
I strongly suspect that the officials responsible for these documents swapped drafts as they were being produced. But I’d like to imagine that drafters in one jurisdiction, having realised that the other was much further advanced, quietly opened up a generative chatbot and typed in the words “draft me some AI guidelines for judges operating in a common law system”.
Would that have been in England and Wales? Or New Zealand? No doubt AI will tell me that it never happened.
Universities have been using software called Turnitin for sometime now to detect plagiarism. It now also detects if AI has been used. So maybe its use will become common place in court. Unfortunately American spelling is becoming more and more common. Even the Scottish Government has resorted to using “license”instead of “licence”.
Since search engines now seem to be using AI which also seems to have crept into certain newspapers, I now have resorted to buying textbooks on a subject to be investigated. Newspapers are now disregarded in my household as I like facts not suppositions.