Large language models can’t plan, even if they write fancy essays

This article is part of our coverage of the latest inAI research.

40% off TNW Conference!

We found right away that GPT3 is pretty spectacularly bad on anecdotal tests.

Other recent studies include one that shows LLMs can dozero-shot reasoningif provided with a special trigger phrase.

However, reasoning is often used broadly in these benchmarks and studies, Kambhampati believes.

What LLMs are doing, in fact, is creating a semblance of planning and reasoning through pattern recognition.

Even a large part of speech is performed by System 1.

But the line between System 1 and System 2 is not clear-cut.

Take driving, for example.

This is clearly System 2 at work.

It consumes a lot of energy, requires your full attention, and is slow.

But as you gradually repeat the procedures, you learn to do them without thinking.

The task of driving shifts to your System 1, enabling you to perform it without taxing your mind.

For example, professional chess players rely a lot on pattern recognition to speed up their decision-making process.

A similar phenomenon might be happening in deep learning systems that have been exposed to very large datasets.

They might have learned to do the simple pattern-recognition phase of complex reasoning tasks.

This is what we wound up doing, Kambhampati said.

The team developed their benchmark based on the domains used in the International Planning Competition (IPC).

The framework consists of multiple tasks that evaluate different aspects of reasoning.

Each problem has an initial condition, an end goal, and a set of allowed actions.

We used the Blocks world examples for illustrating the different tasks.

Each of those tasks (e.g., Plan generation, goal shuffling, etc.)

can also be posed in other IPC domains.

Unlike other benchmarks, the problem descriptions of this new benchmark are very long and detailed.

Solving them requires concentration and methodical planning and cant be cheated through pattern recognition.

Reasoning is a system-2 task in general.

They do better on the planning-related tasks that dont require chains of reasoningsuch as goal shuffling.

The researchers hope that their work opens new windows for developing planning and reasoning capability for current AI systems.

Other scientists, includingdeep learning pioneer Yann LeCun, have made similar suggestions.

it’s possible for you to read the original articlehere.

Also tagged with#