Chatbotshave an alarming propensity to generate false information, but present it as accurate.

This phenomenon, known as AI hallucinations, has various adverse effects.

At best, it restricts the benefits of artificial intelligence.

New technique makes AI hallucinations wake up and face reality

At worst, it causes real-world harm to people.

As generative AI enters the mainstream, the alarm bells are ringing louder.

They sayit can reduceAIhallucinations to single-figure percentages.

Article image

The system is the brainchild ofIris.ai, an Oslo-basedstartup.

Founded in 2015, the company has built an AI engine for understanding scientific text.

The software scours vast quantities of research data, which it then analyses, categorises, and summarises.

Iris.ai founders (left to right) Maria Ritola, Jacobo Elosua, Anita Schjøll Abildgaard, and Victor Botev

It’s free, every week, in your inbox.

What doesnt save their time is AIhallucinating.

The key is returning responses that match what a human expert would say.

Sometimes the inaccuracies cause reputational damage.

At the launch demo ofMicrosoftBing AI, for instance, the systemproducedan error-strewn analysis of Gaps earnings report.

At other times, the erroneous outputs can be more harmful.

ChatGPT can spout dangerousmedical recommendations.Security analystsfearthe chatbots hallucinations could even drive malicious code packages towards software developers.

Nonetheless, 84% of them still use ChatGPT as their primary AI tool to support research.

These problematic practices spurred Iris.ais work on AI hallucinations.

We map out the key knowledge concepts we expect to see in a correct answer, Botev says.

Then we peek if the AIs answer contains those facts and whether they come from reliable sources.

A secondary technique compares the AI-generated response to a verified ground truth.

Using a proprietary metric dubbedWISDM, the softwarescores the AI outputs semantic similarity to the ground truth.

This covers checks on the topics, structure, and key information.

Another method examines the coherence of the answer.

The combination of techniques creates a benchmark for factual accuracy.

Under the covers, the Iris.ai system harnesses knowledge graphs, whichshow relationships between data.

The knowledge graphs assess and demonstrate the steps a language model takes to reach its outputs.

Essentially, they generate a chain of thoughts that the model should follow.

The structure could even prompt a model to identify and correct its own mistakes.

As a result, a coherent and factually correct answer could be automatically produced.

We need to break down AIs decision-making.

In preliminary tests, thefeature reduced AI hallucinations to single-figure percentages.

The problem, however, has not been entirely solved.

According toBotev, the challenges dont stem from the tech, but from the users.

Consequently, they can misinterpret the results they receive.

People self-misdiagnose illnesses all the time by searching their symptoms online, Botev says.

We need to be able to break down AIs decision-making process in a clear, explainable way.

The companys newPhi-1.5 modelispre-trained on textbook quality data, which is both synthetically generated and filtered from web sources.

Another method involves removing bias from the data.To do this, Botev suggests training a model on coding language.

Inevitably, these sources contain human biases.

In coding language, there is a far greater emphasis on reason.

This leaves less room for interpretation, which can guide LLMs to factually accurate answers.

On the other hand, it could give coders a potentially terrifying power.

Its a matter of trust.

Despite its limitations, the Iris.aimethod is a step in the right direction.

By using theknowledge graph structure, transparency and explainability can be added to AI.

In the future, this should yield further reductions in AI hallucinations.

For Botev, the work serves a crucial purpose.

It is to a large extent a matter of trust, he says.

Story byThomas Macaulay

Thomas is the managing editor of TNW.

He leads our coverage of European tech and oversees our talented team of writers.

Away from work, he e(show all)Thomas is the managing editor of TNW.

He leads our coverage of European tech and oversees our talented team of writers.

Away from work, he enjoys playing chess (badly) and the guitar (even worse).

Also tagged with