Only Connect is widely regarded as Britain’s toughest TV quiz – but it’s no match for ChatGPT’s new problem-solving AI model.
OpenAI yesterday unveiled a preview of its new o1 AI model, which the company claims is designed to “reason through complex tasks and solve harder problems than previous models in science coding, and math”.
Whereas the company’s previous AI models have often stumbled over quite basic questions, such as how many times the letter R appears in the word Strawberry, the new model is designed to respond more like a person would. “Through training, they learn to refine their thinking process, try different strategies, and recognize their mistakes,” OpenAI claims in a blog post announcing the o1-preview.
To put this to the test, I decided to see how it would cope with a series of questions from Only Connect. For those not familiar with the show, it’s no ordinary quiz. Contestants have to find complex connections between words or phrases, complete sequences, or solve phrases where all the vowels have been removed from the answers.
The quiz is unapologetically pitched at intellectuals and is something of a cult hit in Britain, where it’s broadcast on the BBC.
Here’s how ChatGPT’s problem-solving AI got on with the four different rounds of questions from Only Connect. All questions were taken the show’s official quiz book.
Connections
The first round of the quiz is Connections. Contestants are given four clues in turn and they must find the connection between them. In the quiz, the contestants are given the clues one by one, scoring more points for getting the answer right with fewer clues. For this test, I gave ChatGPT all four clues at once.
For example, I asked ChatGPT to find the connection between the following four words:
- Incognito
- Unbeknownst
- Nonchalant
- Misnomer
After nine seconds of thinking, it correctly worked out that all of these words have no positive opposite in the English language.
It was even faster at working out the connection between manufacturing gunpowder, Roman mouthwash, thickening wool and marking territory was that they were all processes that historically involved the use of urine.
It took a mere five seconds to establish the connection between these four:
- Absolutely anybody
- Male leads in 3rd Rock from the Sun
- Tunnels in The Great Escape
- Presidents Jefferson, Nixon and Truman
Got it? They all involve Tom, Dick and Harry. I asked the ChatGPT o1-preview five Connections questions and it scored a perfect five out five in this round.
Sequences
Sequences is similar to Connections, in that all the words are linked by a common theme. However, in this round, contestants must work out what the fourth item in the sequence will be, without knowing what the theme is.
I started with what I thought was a tough one, to find the fourth term in this sequence:
- 1485
- Elizabethan
- Regency
ChatGPT took 12 seconds to think before correctly identifying they were the historical periods used for the first three series of the British TV comedy, Blackadder, before providing the fourth answer: World War I.
Even more impressively, when I accidentally mistyped the third clue in the following sequence:
- The arrival of Captain James Cook
- Centuar throwing a spear of light
- 2,088 Fou drummers
It took 11 seconds to think, before correctly working out these were iconic moments in successive Olympic opening ceremonies, and that it should have been 2,008 Fou drummers for the 2008 Olympics in Beijing, not 2,088. It added the fourth answer as Queen Elizabeth II parachuting with James Bond for London 2012. Pretty amazing, given the question was partly incorrect!
Only one question tripped up ChatGPT in this round. When asked for the next in sequence for:
- Wheat
- Sett
- Cease
The AI worked out the sequence was numbers related. The theme is how numbers are spoken in French (huit, sept, six for eight, seven and six). ChatGPT seemed to figure this out, but then suggested the next word in sequence should be “hive”, which is an English homophone for “five”, not say “sank”, which would have been a correct homophone for “cinq”. In this round, then, it scored four out of five.
Connecting Walls
The next round is arguably the hardest for the AI to solve. The contestants are given 16 words on a 4×4 grid (or wall) and asked to separate the words into four groups of four. Each group has a common theme.
To make life even harder, red herrings are thrown into the mix: some words could be part of two groups. Here, for example, are the 16 words from one of the Connecting Walls I asked ChatGPT to solve:
1. Priest 2. Lawford 3. Knight 4. Pope 5. Kremlin 6. Sinatra 7. Martin 8. Bishop 9. Child 10. Deacon 11. Grand Slam 12. Hopman 13. Sister 14. Davis 15. Canon 16. Neighbour
These puzzles really caused the AI to ponder, and while it was processing, it partly revealed its “thought process”. On the above, for example, it said “mapping job titles” as it began to work through potential links.
After 88 seconds it delivered the correct answer. Group 1 is Rat Pack members (Sinatra, Martin, Davis, Lawford). That group could also have included Bishop, for Joey Bishop, but ChatGPT realized that should be in group 2, clergy members (Pope, Bishop, Priest, Deacon.) Group 3 was tennis competitions (Davis, Hopman, Kremlin and Grand Slam) and group 4 could all be completed with “hood” (Knight, Child, Sister, Canon). Canon and Sister could also have been in the clergy, of course.
That’s a pretty stiff challenge, but the AI pulled it off. As it did for another Connecting Wall I threw at it. It came close to making it three out of three, but got two groups wrong, falling for a red herring and failing to notice a series of words that could be followed with the word “stop”. Still, it would still score some points in the quiz for getting two groups and three connections correct. Overall, it would have scored 25 out of a possible 30 points in the quiz.
Missing Vowels
The final round of the quiz is Missing Vowels, where contestants are given a theme and must identify the clues, which have had the vowels stripped from them. To make it harder, the words are inconsistently spaced. So, for example, if the answer was:
THE CHICAGO BEARS
It may appear in the Missing Vowels Round as:
THC HCGB RS
I thought this would be the easiest round for the AI to crack, but on its first attempt it got all four answers wrong. And then I realized… I had accidentally reverted to the older GPT-4o model and not the o1-preview. When given the same set of words on the new model, it scored a perfect four out of four.
For reference, the four clues were:
- FRT NTM STHR
- TMC NMB RFM LYB DNM
- SP TSNT WDC
- LFT HNVR SNDV RYT HNG
And the theme is “they equal 42”.
ChatGPT took precisely 11 seconds and then provided the following answers:
Fourteen Times Three
(14 × 3 = 42)
Atomic Number of Molybdenum
(Molybdenum’s atomic number is 42)
Spots on Two Dice
(Each die has 21 spots; 21 × 2 = 42)
Life, the Universe, and Everything
(According to Douglas Adams’ The Hitchhiker’s Guide to the Galaxy, the answer to life, the universe, and everything is 42)
I’d love to have tested ChatGPT on more Missing Vowels rounds, but by this time I’d burned through all my preview credits, meaning I’ll have to wait until next week to use the model again. So, in a brief test, it scored four out of four.
ChatGPT o1-Preview Verdict
The ability of the AI to solve even fairly complex word problems is genuinely staggering. Equally impressive is the way the AI shows its thinking as it’s working through problems, ruling out some theories and going back to others, until it finds the correct answer.
It’s not flawless, but it’s a huge step forward in sophistication from the previous model. And plenty smart enough to be a winning contestant on Britain’s toughest TV quiz.