Sometimes, in examining the capabilities of large language models and neural networks, people focus so much on the technicalities that they neglect to form an analysis that’s more philosophical, something that addresses the substance of how these technologies work in a way that’s eloquently human, and that appeals to our other brain – our creative and intuitive side.
Or to put it another way, if there was a ‘religion’ behind technology, what would it sound like?
In the western world, people often look to ideas of classically Eastern origin, like the tenets of Buddhism, in order to balance the authority of logic and structure and rigid thinking: to which point – let’s take a look at what a team of researchers are doing with the Qwen model.
What’s In a Name?
First of all, the Qwen model was developed at Alibaba, and, presumably, created by a predominantly Chinese team. The name is an acronym, standing for ‘quantum wisdom enhanced network,’ but some also point out that it sounds a lot like ‘gwen,’ a Welsh word meaning ‘white’ or ‘fair’.
That’s interesting, and you might ask why Qwen’s creators might use a Welsh word for their system – when I asked ChatGPT about this, here’s what it said:
“’Qwen’ is not a traditional Welsh word. While the name ‘Qwen’ might sound similar to ‘Gwen,’ a Welsh name meaning ‘white,’ ‘fair,’ or ‘blessed,’ Qwen itself does not originate from the Welsh language. In the context of Alibaba’s AI models, Qwen is an acronym for Quantum Wisdom Enhanced Network, and its naming appears to be a branding choice rather than a connection to Welsh origins. It reflects the model’s emphasis on advanced intelligence and computational capabilities, rather than a linguistic or cultural connection to Wales.”
Anyway, I’ve been talking about the abilities of Qwen in mathematical data set operations. You can see some of the news around this model’s prominence here. But a new essay by the Qwen team really provides a radically different look at what’s going on with this model.
The Intro
“What does it mean to think, to question, to understand?” the writers begin, in a translation of their work to English. “These are the deep waters that QwQ (Qwen with Questions) wades into. Like an eternal student of wisdom, it approaches every problem – be it mathematics, code, or knowledge of our world – with genuine wonder and doubt. QwQ embodies that ancient philosophical spirit: it knows that it knows nothing, and that’s precisely what drives its curiosity. Before settling on any answer, it turns inward, questioning its own assumptions, exploring different paths of thought, always seeking deeper truth.”
Some would say that you can almost hear echoes of the traditional Dao de Ching and other archival writings in this screed, which lays out a lot of that higher-level thinking about where people are, and where they want to go.
“This version is but an early step on a longer journey – a student still learning to walk the path of reasoning,” the authors write. “Its thoughts sometimes wander, its answers aren’t always complete, and its wisdom is still growing. But isn’t that the beauty of true learning? To be both capable and humble, knowledgeable yet always questioning?”
In many of our communities, at least the ones I move in, we’re talking about neural network models “hallucinating,” but this above quote provides a very different take on how and why systems act the way they do, one that is much more poetic and evocative.
Pondering Performance
Take a look at how the team explains the impact of inference on model behavior and performance:
‘When given time to ponder, to question, and to reflect, the model’s understanding of mathematics and programming blossoms like a flower opening to the sun.”
That in itself shows you how human writers are applying poetry to the practice of describing the neural network’s output. I’ve read a number of examples of people explaining inference, since it’s cutting edge technology right now, and none of them have this level of emotional heft.
The team then goes into raw scores on data sets like MATH 500, on which it scored a staggering 90.6%, and GPQA.
Showing the Work
The team plays out some of its demo cases this way:
“In the following examples, we invite you to witness the profound depths of QwQ-32B-Preview’s contemplative abilities. Like a seeker of wisdom on an endless journey of discovery, the model demonstrates its capacity for deep introspection – questioning its own assumptions, engaging in thoughtful self-dialogue, and carefully examining each step of its reasoning process. Through these cases, you’ll observe how QwQ embodies the timeless understanding that true knowledge emerges not from hasty conclusions, but from patient inquiry and the courage to challenge one’s own certainties in the eternal pursuit of truth.”
Then they have published an actual logical example of a complex equation, to which the program has to add a pair of parentheses, in order to make an order of operations correct.
Here’s what I got from this example – it’s an order of magnitude more complex than anything I’ve ever seen in terms of chain of thought and reasoning for a model. Count the number of pages that it would take to print this logical reasoning path. It’s at least five or six pages long, and maybe a lot more.
There are literally a few hundred lines of explanation.
Many of the other examples that have come out in the past months might have five or six phases of inference listed, but nothing close to that level of detail.
And the reasoning steps are written in human-sounding language.
Once again, as I did with earlier models, you sort of get a type of vertigo when you’re reading it, trying to figure out how the machine, which lacks a human brain, is coming up with all of this nuance. But that’s not an uncommon experience as we gaze into the mechanical hearts of NNs.
The real beauty of the Qwen essay probably lies not in its demonstrable nature, but in its poetry. We simply have to take these technologies whole, as they are, and put them in a lens that produces thoughtful, ethical, empathetic behavior. Otherwise, what are we doing?