Law is big. The legal trade may be one of the slower-moving elements of our society to digitalize, but the sheer volume of information that the business operates with makes its potential application surface for intelligence technologies huge. Organizations such as American data analytics company LexisNexis have been championing the use of online portals for computer-assisted legal research for some time now, but the wider rollout of “legal apps” is at a lower level than inside usual suspect data analytics industry verticals, such as retailing and manufacturing.
While the deployment of big data analytics in the legal business has matured, the encroaching spectre of artificial intelligence hasn’t gone unnoticed. As we start to examine and apply automation intelligence to this arena, the question of trustworthy AI, bias and hallucination start to take on what could be a more life and death significance depending on the case at hand.
Beneath the surface here may be hidden risks that could undermine or damage legal cases. For instance, using AI for summarization or drafting could result in incorporating false or partial facts about a case, leading to misguided legal strategies.
Brainstorming Barrister Bots
CEO of Eve Jay Madheswaran has strong opinions on this subject. Eve is a personalized legal AI tool capable of performing legal research (such as filtering through a database or other repository of court rulings) and can be used via natural language search prompts. It can also help brainstorm to evaluate legal claims. He points out that a Stanford study suggested that AI chatbots hallucinate between 58% and 82% of the time on legal queries.
“This isn’t merely an academic concern; it has real implications for legal cases, potentially derailing them with false or partial information,” said Madheswaran. “To be clear, I am not a fearmonger, but I bring up AI trust because there are measures both providers and users can take to minimize these risks. Addressing AI hallucinations in this space is not just about identifying errors, but also about implementing robust systems to ensure the accuracy and reliability of AI-generated outputs.”
But before we progress m’lud, let’s make sure we understand what’s happening inside an AI hallucination.
Why Do Models Hallucinate?
AI models hallucinate for a number of reasons and understanding these causes is critical to addressing the risks. AI models like GPT-4 are trained on vast datasets containing a mix of accurate and inaccurate (or at least less accurate) information. This training data can include a degree of errors and it can lack the necessary context for understanding complex subjects, leading to incorrect outputs. AI models generate responses based on patterns recognized in this data, but they don’t truly “understand” the content.
“This is all true, AI models predict the next word or phrase based on statistical probabilities, which can result in plausible-sounding but entirely fabricated information,” clarified Madheswaran. “Additionally, these models can struggle with maintaining context, especially over long conversations or intricate topics. This loss of context can lead to hallucinated responses that seem accurate but are fundamentally flawed. Another critical factor is the lack of real-time knowledge. AI models do not have access to databases or verification systems to confirm the accuracy of their responses, relying solely on potentially outdated or incomplete training data.”
To increase the quality and trust of AI for legal users, improving the underlying technology is obviously fairly important. A method used across all industry applications and a technology that has been widely discussed already is retrieval-augmented generation.
It’s RAG-time
Widely known as RAG, this process is agreed to reduce AI hallucinations by introducing a degree of known and validated data that belongs directly to (for example) a company, a process or a job. We could say that RAG helps direct the AI model to search through a “controlled repository” before looking at more general training data, much like a lawyer reviewing relevant case law before giving an opinion.
“As a working example of this technology in the legal business, let’s imagine we’re looking to find a set of relevant legal cases based on a particular fact pattern. Without RAG, the AI would answer based on everything it knows to be true, similar to a human lawyer providing an answer based on their knowledge and memory. However, like a human with imperfect memory, the AI might make up a case that isn’t real, combine cases into one, or, occasionally, provide a correct answer. With RAG, the AI instead searches a room full of case law, finds the most promising cases, reads them in full, then picks the top relevant ones. This controlled search process helps prevent the AI from using irrelevant or incorrect sources,” said Madheswaran
As we’ve learned then, the effectiveness of RAG hinges on an AI model’s ability to accurately retrieve the right information. If we imagine two models that search the same room of case law books: one model might be very good at choosing the right books to read fully, while the other is not. Even if both models are equally good at reading and summarizing a book, the model that chose the right books based on their relevance will provide a better answer. Therefore, refining the AI’s retrieval process is a significant opportunity for improvement in legal research AI applications.
Just The Facts, Ma’am
“Sometimes though, RAG isn’t enough. In key situations where we need to make sure that nothing was missed and nothing was skimmed, AI should incorporate new methodologies. Although they are not sustainable for all tasks, switching the AI to read key documents line-by-line will create a higher level of confidence in the answers. We call this a ‘fact search’ and it helps users find key facts throughout case documents. Instead of using RAG or a general large language model approach, when performing fact search, the AI reads each page of the case documents, word by word, searching systematically for quotes and facts that support a particular claim or search query. For example, the AI will search through all case files, including transcripts, email chains, and employment documents, to find facts that support a Wrongful Termination cause of action,” explained Madheswaran.
Each source is then linked to its origin and presented to the end-user in a table that can be investigated and downloaded. This new way of searching for information allows a higher level of confidence in any answer provided, backed by the ability to verify. Both of these tools can ensure that AI-generated responses are not only accurate but also easily verifiable by users, providing an additional layer of confidence in the AI’s outputs.
But even minimizing poor answers through RAG improvements or shifting to a line-by-line reading isn’t always enough here. In these scenarios, a ‘trust but verify” model can be implemented – where users can easily and confidently verify the answers that AI provides them. One way to implement this is through an AI verification framework that runs every AI-generated response through custom rules to determine validity – which the user can then verify at a glance. For example, when the AI references quotes from a source document, it pulls out quotes from the document and the validation framework. We might call this a sanity check.
Mechanical Magistrates
This is just the beginning for legal, generative AI. As this technology evolves, Eve’s Madheswaran suggests that we can expect continuous improvements in quality and trustworthiness. Enhancements will come through additional verification rules, better retrieval processes and so on.
In the meantime, would you be happy to go up against a judge of the land if it were a mechanical magistrate that based its decision-making upon a wholly digitized information analytics process? Perhaps not, so then, don’t drink and drive or go over the speed limit and remember to pay for all your items at the supermarket self-checkout, right?