Small, regular, medium or large – sir/madam?

When it comes to coffee, pitchers of beer, cheeseburgers and items of clothing, going large usually means you’re getting more value for money, a better bundled deal, or at least a more commodious and capacious garment that might give the consumer arguably more threads-per-penny.

Even in the world of data, our notion of big data (typically that amount of information that does not fit easily into a standard relational database management system in the pre-cloud age) generally suggests that we’re accommodating for additional channels of information that might include a previously untapped data warehouse, data marts or some connection to wider data lakes filled with unstructured but potentially still valuable goodness.

In AI language models, as we know from the fervent discussion already spreading across the technology industry, small language models with their specialized alignment and focus have a key role to play, augmenting the wider and more generalized knowledge that large language models supply as they do.

Size Doesn’t Matter

In reality, we shouldn’t be putting so much focus on the language model size and trying to define the next stage of artificial intelligence development by any significant measure of this kind. This is the opinion of Steve Mayzak, global managing director for Search AI platform at Elastic, the company known for its enterprise search, business analytics and wider machine learning technologies. But why is all this so?

There has of course been a lot of discussion about the benefits of small and large language models. In terms of operation, SLMs offer a defined focus and a market, subject, domain or topic specialism. Conversely, LLMs offer a broader collective of language data about our world, with large image models, large audio models and large haptic (human touch) models also providing the more expansive base layer upon which SLMs can also be attached.

But perhaps the size conversation often misses the point. It’s not about size or complexity. It’s about how SLMs are designed to make decisions and balance their specialization with their capacity for broader, general knowledge.

“Like with most technologies that gain traction, SLMs do come with their advantages. But they’re like specialists: they excel in specific domains, like law or finance. That’s the thing about specialists, their focus is narrow… and how they’re defined changes over time,” explained Mayzak. “Think about an SLM that’s focused on civil infrastructure. It won’t have insight into transportation infrastructure, or how the two specialisms intersect. Just as human intelligence evolves, so must the specialisms… and by nature, SLMs.”

Testing Times For Turing

If we think about the Turing Test (the assessment of whether a machine is capable of exhibiting AI reasoning such that its decisions are indistinguishable from human actions), it was created to determine our measure of machine “thinking” capability. But as our understanding of what defines intelligence itself has progressed, the test is now arguably viewed as somewhat limited. With this change of goalposts/yardstick thought in mind, we can say that to stay effective, SLMs will need to grow, digesting new data to mimic our understanding of intelligence.

So will SLMs grow to become LLMs? We may soon get to a point when we’re not able to define or even say when an SLM stops and an LLM starts? As models evolve, the lines between them will blur.

“I’d avoid debates around the benefits of a model’s size. Just call them language models,” asserted Mayzak. “Consider SLMs as the experts and LLMs as the all-rounders… and here’s where things get interesting. I like to think of an LLM as the ‘decider’ i.e. it can determine what specialization is required to address a task. Once it’s made that decision, it can tap into the appropriate SLMs for the domain-specific expertize needed to reason through the problem and generate an answer. In this sense, while LLMs guide the process, SLMs provide the deep, specialized knowledge. You could even imagine an SLM that specializes in determining which SLMs to consult for a particular task.”

Just as political leaders or doctors may consult specialists in their respective fields, these models will increasingly act as checks and balances… making decisions, steering the conversation and directing us to the right expert for the job.

Analysis Paralysis

One caveat, though, it’s not possible to train a model to cover everything, so there will always be trade-offs between depth and breadth. A model can either dive deep into a single domain or cast a wider net across many. But there’s always a limit to how much it can truly specialize while maintaining general knowledge.

According to Mayzak, the real challenge comes when we try to bring together more domains. The more areas we include, the trickier it becomes to figure out which expertize is needed in any given situation. It’s like trying to juggle too many balls. Eventually, something’s going to drop. Add to that the issue of analysis paralysis, where an overload of information can leave us stuck and we’ll have a real bottleneck in decision-making. It’s the paradox of choice i.e. too much knowledge can hinder progress.

We mustn’t forget that LLMs also have their own set of limitations. The complexity of the task often means there’s a trade-off between how long they can reason through a problem and how quickly they can deliver a response. The longer they take to deliberate, the slower they are to provide answers. It’s like asking a generalist to be quick, but also expecting them to take hours to come up with a perfect solution. Although the likelihood of getting the right answer increases with time, just like with humans, models will reach a point of diminishing returns.

Good, Useful AI Bias, Honestly

“When it comes to SLMs, there are some clear limitations, especially regarding access to knowledge and their ability to handle complex, real-world situations. But these models can’t be expected to do everything, so just like larger models, they often inherit biases from the humans who train them,” said Mayzak. “But I’d put forward the argument that bias isn’t always the bad guy. We often view bias as something inherently negative, although in some cases, it’s a protective mechanism. Think of it like a company’s need for core values or standard operating procedures to maintain order and consistency. Without those guiding principles, chaos could ensue.”

So in AI, some biases are useful. They can help models make quick, decisive judgments within their specialized domains. For want of a random but highly illustrative example, if we were building AI services for a mobile banking application to be used by female customers in Rwanda, then we’d need to ensure – positively – that we had a skewed bias of information in our language model related to Rwandan mobile banking users who were female, exclusively.

Again, Mayzak reminds us, the issue arises when these biases are misapplied or become too rigid, stifling adaptability. He says that some bias in an SLM may be beneficial for a narrow, specialized task, although it’s important to ensure it doesn’t go too far, limiting the model’s ability to evolve and take on more complex challenges.

Clarity From Complexity

The challenges of trade-offs between depth and breadth, the risks of analysis paralysis, and the nuances of bias all point to a key takeaway: as these models evolve, we’ll need to focus less on the labels (and size) and more on their ability to navigate complexity and make informed, balanced decisions. The real question is about how these models can help us find clarity in an increasingly complex world that’s overwhelmed by choice.

Share.
Exit mobile version