Speculation Brims That OpenAI o1 Leverages Multiple Chain-Of-Thought And Meta-Reasoning

Get ready to be somewhat amazed and intrigued.

I am going to walk you through some advanced AI magic and bring you up-to-speed on state-of-the-art boundary-pushing when it comes to generative AI. It is a bit of a mystery story too.

In this column, I will showcase something that is on the outskirts of everyday generative AI and primarily experimental in advanced AI labs. It is an approach that leverages multiple chain-of-thought processing and incorporates AI-based meta-reasoning. Some believe that this might be an essential ingredient or secret sauce of the new o1 generative AI model, and, by the way, the future for leading-edge generative AI.

Maybe so, maybe not.

I will lay out the whole kit-and-kaboodle in plain language so that you’ll know what this all consists of. We can put our heads together at the end of the discussion and decide how things look.

Boundary-Pushing Advances In Generative AI

Let’s start at the genesis of why this topic has recently risen to widespread attention.

You likely are aware that OpenAI released its latest generative AI model which is named o1. There has been quite the bubbly hubbub about o1. For a straight-ahead scoop, take a look at my overall assessment in my Forbes column (see the link here). I quickly followed up by posting a series of pinpoint analyses covering highly noteworthy features, such as a new capability encompassing double-checking to reduce AI hallucinations and produce more reliable results (see the link here).

Since OpenAI considers its AI models to be proprietary, guessing what they are doing under the hood is enormously challenging. Some fervently speculate that the approach I’ll be describing is indeed at play. Others proclaim that OpenAI might be toying with the approach in the AI cookery behind the scenes but haven’t opted to make this an active component at this time.

Either way, the odds are that most generative AI models will eventually use this approach or something akin to it.

The core element consists of a technique and technological underpinning known as chain-of-thought reasoning. It goes like this. AI research has shown that if you tell generative AI to process requests or solve problems on a stepwise basis, the odds are that the generated results will be improved. The step-at-a-time approach appears to advantageously make the AI more deliberate, more thorough, and not rush to produce a response.

It is referred to as chain-of-thought reasoning due to the aim of having AI do each logical step, one after another, as though the steps are chained together in a reasoning fashion to solve a problem or answer a question.

This is roughly patterned on how humans seem to think. When a person makes a rushed judgment, they are bound to miss something or make a mistake. Telling someone to be mindful and proceed on a systematic stepwise basis can markedly enhance their decision-making. Though generative AI is not sentient and is merely software and hardware, we can put into practice a similar concept by getting the AI to act on a stepwise or chain-of-thought basis.

An Example Of Chain-Of-Thought In Generative AI

Suppose I log into generative AI and ask a question about how I can best get from San Francisco to New York City. I will keep this simple so that I can bring attention to the matters at hand.

Let’s see what happens without invoking chain-of-thought.

My entered prompt: “What is the best way to get from San Francisco to New York City?”
AI-generated response: “Fly there.”

Okay, that answer makes sense. The outcome seems reasonable, though we only see the generated result and do not know what steps the AI took to arrive at the answer.

I will do the same prompt, and this time enable a chain-of-thought feature so we can see what steps are being undertaken.

My entered prompt: “What is the best way to get from San Francisco to New York City?”
Generative AI chain-of-thought enabled.
Step 1: Assume that speed of travel is key.
Step 2: Airline travel would be the fastest method.
Step 3: There are lots of flights to choose from.
Step 4: Discount tickets are often available so the trip can be affordable.
Step 5: Conclusion that flying is the best option.
Step 6: Display the answer.
AI-generated response: “Fly there.”

You can see that six steps took place.

The steps seem to be relatively logical. Consider that this set of steps is to be construed as a single-chain of reasoning or one chain-of-thought that was internally produced.

I will hit the refresh button and start anew. Once again asking the same question. Let’s see what answer we get and what the chain-of-reasoning consists of.

My entered prompt: “What is the best way to get from San Francisco to New York City?”
Generative AI chain-of-thought enabled.
Step 1: Make an assumption that seeing the country is key.
Step 2: Driving a car would allow for readily viewing the country during the trip.
Step 3: Thus, driving is the best option.
Step 4: Display the answer.
AI-generated response: “Drive there.”

Notice that this time we got a different answer, namely that driving would be the best way to make the trip.

Once again, this is a single-chain of reasoning.

Making Use Of Chain-of-Thought Multiple Instances

Why did we get one answer that said to fly the distance, while the other answer said to drive the distance?

If we didn’t have available the chain-of-thought that each instance used to solve the problem, we would have no idea of what logic was being used. We would have to wildly guess why one response said to fly and the other said to drive.

Aha, let’s leverage the two chains.

Suppose we could get generative AI to internally produce multiple chain-of-thought instances during the processing of the question, and then use those to figure out an even better answer for us. The AI could inspect the chain that led to flying, and inspect the chain that led to driving, and combine them to arrive at a better answer than might be derived individually.

Here’s the kind of answer that we might get.

My entered prompt: “What is the best way to get from San Francisco to New York City?”
AI-generated response: “There are two options: (1) Fly, which is fastest, or (2) Drive, which allows you to see and visit the country. The best way is then based on your preference of speed of travel versus being able to view the country during the trip. Let me know which you prefer, and I can provide additional details.”

You now can see that the answer is vastly improved.

Rather than simply stating to fly or to drive, the AI has made use of both chain-of-thought instances and computationally opted to compare and combine them. The generated result is that we are informed that flying has its merits, driving has its merits, and we can now choose which best fits our needs.

This overall approach is known as multiple-chain reasoning.

The AI when solving a problem or answering a question will internally craft two or more chain-of-thought instances. The sky is the limit on how many chains we might encourage the AI to internally produce. After devising multiple chains, the AI would attempt to leverage the respective chains and derive a better answer than might otherwise occur when only relying on one chain alone.

Knowing how to develop generative AI to produce multiple chain-of-thought instances when solving a stated problem or answering a question is relatively straightforward and can be done without too much rocking of the boat. It’s not a piece of cake, but it is a known thing. That’s the good news.

The rougher news is that figuring out how generative AI ought to optimally leverage the multiple chain-of-thought instances is an open issue and a tough nut to crack.

Meta-Reasoning About Multiple Chain-Of-Thought Instances

The act of somehow bringing together the multiple chains into a coherent and sensible result is said to be meta-reasoning.

This is known as meta-reasoning because, in a broad sense, the overarching combining function is “reasoning” about the various chain-of-thought reasonings (i.e., reasoning about reasoning). The word “meta” conventionally is used to stipulate that something transcends other items. In this case, we are trying to transcend the multiple chain-of-thought instances and “reason” about them across the board to arrive at a suitable answer.

I noted a moment ago that this is hard to do.

Here’s the rub.

There are lots of lousy ways to perform meta-reasoning. For example, with the two chains indicated above, suppose we simply discarded the one about flying. We only opted to use the one about driving. The answer would then be to drive the distance. That isn’t as good an answer as when utilizing both chains actively.

Not very astute.

Okay, maybe we have a rule that indicates the AI must never discard a chain. The problem with that rule is that we might have a chain-of-thought that contains an error and we are going to force the AI into presumably using it. In the San Francisco to New York City example, suppose the AI had produced a chain-of-thought that said the best way to proceed would be to swim the distance. I dare say that we can toss out that option. But, if we have set up the AI to never discard a chain, the swimming chain is likely to get intertwined when combining the chains on flying and driving. Sad face as to what zany results we would get.

The gist is that there are lots of ways to devise meta-reasoning. Some are good, some are bad, and some are in-between. Active research is underway on finding the best ways to perform meta-reasoning in conjunction with a multiple chain-of-thought approach.

Advanced AI Research Underway On Meta-Reasoning

To give you a taste of AI research on this weighty topic, consider this research study entitled “Answering Questions By Meta-Reasoning Over Multiple Chains Of Thought” by Ori Yoran, Tomer Wolfson, Ben Bogin, Uri, Katz, Daniel Deutch, and Jonathan Berant, Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, December 2023, and these salient points (excerpts):

“We introduce Multi-Chain Reasoning (MCR), an approach which prompts large language models to meta-reason over multiple chains of thought, rather than aggregate their answers.”
“Chain-of-thought prompting has been shown to dramatically improve performance on reasoning-heavy tasks.”
“Furthermore, research has shown that sampling multiple chains of thought and returning their majority output further improves accuracy, a method which they term self-consistency (SC).”
“First, when the space of possible outputs is large each reasoning chain may lead to a different output, in which case no significant majority will be formed. Second, focusing exclusively on the final output discards relevant information that is present in the intermediate reasoning steps.”
“Instead of sampling multiple chains for their predicted answers, we utilize them for context generation. This context is fed to a prompted LLM to read the generated chains and reason over them to return the answer.”

Contemplate those points.

The researchers noted that a rudimentary way to perform meta-reasoning in this sphere would be to simply aggregate the answers, rather than try to analyze the respective chains. That is the barebones approach. It is also known as outcome-based meta-reasoning.

The more promising approach is process-based meta-reasoning, entailing examining each of the steps of each of the chains. This is much more complicated. The belief and hope are that the results will be much more stellar and worth the added computational and algorithmic effort accordingly.

As a mind-bending twist, we could potentially opt to feed the multiple chains back into the generative AI, before displaying an answer, and have overall processing done by the AI on the very chains that the AI had devised. This has shown promise.

Leaning Into Meta-Reasoning And Multiple Chains

AI insiders have eagerly been putting o1 through its paces. One aspect entails identifying the boundaries of what o1 can and cannot achieve. Another key pursuit involves trying to figure out the underlying hidden secret sauce of o1.

In one sense, o1 is kind of an offshoot of its famous close relatives consisting of ChatGPT and GPT-4o. Interestingly, o1 is better in some ways than its well-known cousins, primarily in solving problems involving science, mathematics, and programming or coding, but less seemingly capable across the board.

A clue that some are hanging their hat on is that meta-reasoning across multiple chain-of-thought instances is theorized as an especially beneficial method for solving logic-based problems such as found in science, mathematics, and programming or coding. You see, maybe that’s a sign that the butler did it, namely that since o1 is especially outstanding in those realms, and since meta-reasoning combined with multiple chain-of-thought is believed to be notably valuable in those realms, maybe we can speculate that the secret sauce lands somewhere in that midst.

Mum is the word so far.

In an OpenAI blog about o1, entitled “Learning To Reason With LLMs” (posted September 12, 2024), these rather unassuming points were made (excerpts):

“o1 significantly advances the state-of-the-art in AI reasoning.”
“o1 uses a chain of thought when attempting to solve a problem.”
“It learns to recognize and correct its mistakes. It learns to break down tricky steps into simpler ones. It learns to try a different approach when the current one isn’t working.”
“We plan to release improved versions of this model as we continue iterating.”

Make what you will from those hints or inklings.

Is meta-reasoning across multiple chain-of-thought instances afoot?

Whether it is or isn’t in the mix, you can bet that AI makers and generative AI overall are going to be in hot pursuit of this kind of approach. A bevy of questions and mysteries lie within. For example, how many chains should be generated for a given problem that is being solved? There is a computational cost/benefit tradeoff associated with producing chains. How much processing time should be devoted to meta-reasoning? You could nearly endlessly let meta-reasoning try zillions of permutations and combinations of seeking to combine or align multiple chains. Etc.

Let’s give Albert Einstein the final remark on this for now: “The most beautiful thing we can experience is the mysterious. It is the source of all true art and science.”

There’s no mystery about that.

What's On

Trump Administration, Under Kennedy, Extends Opioid Emergency, Despite Declining Fentanyl Deaths

Amtrak CEO departs amid threats of a transit funding pullback

Musk Donates to G.O.P. Members of Congress Who Support Impeaching Judges

Boundary-Pushing Advances In Generative AI

An Example Of Chain-Of-Thought In Generative AI

Making Use Of Chain-of-Thought Multiple Instances

Meta-Reasoning About Multiple Chain-Of-Thought Instances

Advanced AI Research Underway On Meta-Reasoning

Leaning Into Meta-Reasoning And Multiple Chains

Trump Administration, Under Kennedy, Extends Opioid Emergency, Despite Declining Fentanyl Deaths

Blobfish Goes From ‘World’s Ugliest Animal’ To Fish Of The Year

4 Ways To Fight The ‘Post-Vacation Slump’—By A Psychologist

Our Picks

Trump Administration, Under Kennedy, Extends Opioid Emergency, Despite Declining Fentanyl Deaths

Amtrak CEO departs amid threats of a transit funding pullback

Musk Donates to G.O.P. Members of Congress Who Support Impeaching Judges

Most Popular

Blobfish Goes From ‘World’s Ugliest Animal’ To Fish Of The Year

Fed soothes ‘Trumpcession’ fears by keeping two interest rate cuts on the table

Video: Federal Reserve Extends Pause on Rate Cuts

What's On

Speculation Brims That OpenAI o1 Leverages Multiple Chain-Of-Thought And Meta-Reasoning

Boundary-Pushing Advances In Generative AI

An Example Of Chain-Of-Thought In Generative AI

Making Use Of Chain-of-Thought Multiple Instances

Meta-Reasoning About Multiple Chain-Of-Thought Instances

Advanced AI Research Underway On Meta-Reasoning

Leaning Into Meta-Reasoning And Multiple Chains

Related Articles