In today’s column, I take a close look at the rising availability and utility of said-to-be Small Language Models (SLMs), which are rising in popularity while the advent of Large Language Models (LLMs) continues with great vigor and promise. What does this all mean? The deal is this. You could readily assert that we can have our cake and eat it too. The emerging situation is a true twofer providing the best of both worlds.

Let’s talk about it.

This analysis of an innovative proposition is part of my ongoing Forbes.com column coverage on the latest in AI including identifying and explaining various impactful AI complexities (see the link here).

The Largeness Has Led Us To The Smaller Cousin

Let’s begin at the beginning. When you use generative AI such as the widely popular ChatGPT, you are making use of an underlying capability referred to as a Large Language Model or LLM.

This consists of a computational and mathematical model that has been data-trained on lots of human writing. The Internet is first scanned for all manner of human written content such as essays, narratives, poems, and the like, which are then used to do extensive pattern-matching. The aim is for AI to computationally mimic how humans compose sentences and make use of words.

It is considered a model of natural language such as English and turns out is pretty large in size since that seemed initially to be the only way to get the pattern-matching to be any good. The largeness consists of having a large internal data structure that encompasses the modeled patterns, typically using what is called an artificial neural network or ANN, see my in-depth explanation at the link here. The need to properly establish this large data structure involved doing large scans of written content since just scanning slimly couldn’t move the needle on having viable pattern matching.

Voila, we have amazing computational fluency that appears to write as humans do in terms of being able to carry on conversations, write nifty stories, and otherwise make use of everyday language.

There is a bit of a rub.

The largeness meant that generative AI would only reasonably run on powerful computer servers and thus you needed to access the AI via the Internet or cloud services. That’s how most people currently use the major generative AI apps such as OpenAI’s ChatGPT, GPT-4o, o1, and other akin AI including Anthropic Claude, Google Gemini, Meta Llama, and others. You log into the AI and do so with an online account.

Generally, you are out of luck if you can’t get an online connection when desirous of using LLMs. They are usually online and require online access. That might not be too much trouble these days since you can seemingly get Wi-Fi just about anywhere. Of course, there are still dour spots that do not have online access, and at times the online access you can get is sluggish and the line tends to falter or drop.

Wouldn’t it be nice if you could use an LLM on a standalone basis whereby it runs entirely on your smartphone?

That would solve the problem of having to find a reliable online connection.

This might also reduce costs in the sense that rather than using expensive cloud-based servers, you are simply running the generative AI directly on your smartphone or laptop.

And, importantly, you might be able to have greater privacy when using AI. The deal is this. When using the cloud, any entries you make into the AI flow up to the cloud and can be potentially used by the AI maker (see my discussion of privacy intrusions allowed by AI vendors as per their licensing agreements, at the link here).

Yes, it would indeed be handy and alluring to have generative AI or LLM that runs on your own smart devices.

Well, the world hears you and the answer is that there are Small Language Models or SLMs that are made for that purpose. They are like mini-versions of LLMs. No need to require an Internet connection and the SLM is shaped to hopefully work suitably on small standalone devices.

Sometimes nice things come in small packages.

LLM And SLM Are Pals And Not Opponents

If you’ve not heard about Small Language Models that’s perfectly understandable since they are still not quite up to par. There are a wide variety of experimental SLMs and some are good while others are clunky and less appealing. To clarify, I’m not suggesting that there aren’t valid SLMs that right now can do a decent job for you. There are. Just that on a widespread basis we are still in the infancy or early days of robust SLMs.

The breakout of SLMs into the mainstream marketplace is yet to come. Mark my words, the day of the SLM is coming. Their glory will be had.

Some critics get themselves into a tizzy and believe that you either must favor LLMs or you must favor SLMs. They want to divide or polarize people into one of two camps. You either love the largeness of LLMs and detest the smallness of SLMs, or you relish the compactness of SLMs and outright hate the oversized nature of LLMs. They seek to trap you into a mindset that somehow you must choose between the two.

Hogwash.

There are great uses of LLMs, as there are great uses of SLMs too. Do not let yourself be pushed into a boxed posture that one path is bad and the other is good. It’s just nonsense to make a broad generalization like that. Each approach has its advantages and disadvantages.

I compare this to cars. Sometimes a large powerful car is the best vehicle for your needs. Maybe you are driving across the country and have your entire family with you. In other instances, a compact car is your better choice, such as making quick trips around town by yourself and you want to squeeze in and out of traffic. You must look at a variety of crucial factors, such as cost, speed, comfort, and so on, to make a sensible and reasonable decision.

My viewpoint, which is admittedly somewhat contrarian for those bitter critics, consists of the belief that we ought to actively and avidly pursue both avenues at the same time, namely LLMs and SLMs with equal vigor. Do not drop one for the other. Keep making progress in both directions. We can do this simultaneously and do not have to favor one path only.

Yay, let’s get larger with LLMs. And, yay, let’s get smaller with SLMs. LLMs and SLMs are pals, not opponents.

Example Of The Tradeoffs Afoot

You might be curious about what types of tradeoffs there are between the largeness of LLMs and the smallness of SLMs.

Answering that question is somewhat of a slippery slope. I state this because just as LLMs are getting better via AI advances, the same is true for SLMs. Thus, any example of what LLM or SLM does or doesn’t do right now is rife for ridicule in a few years or even months as to progress being made.

I compare this to the advances made in smartphones. Think back to the release of the first iPhone models. At the time, they were considered quite advanced. If you compare that initial model to the latest version of the iPhone, you will laugh aloud at how puny or limited the first iPhone was. Lack of much internal memory, less capable cameras, screen size and density drawbacks, and other elements that seem quite absurd to us now. But, at the time, we were overjoyed at the capabilities.

Please keep that lesson in mind.

Okay, with those noted caveats, I will give you a kind of example showcasing what the difference between an SLM and an LLM might be, right now. Not in the future. Just at this time. I know that trolls are going to go ballistic at this, that’s why I am trying stridently to clarify things.

First, let’s try out a contemporary LLM and see what we get as an answer to a straightforward question.

  • My entered prompt into LLM: “Can you explain the theory of relativity and how it changed our understanding of space and time?”
  • LLM generated response: “The theory of relativity, developed by Albert Einstein, fundamentally changed our understanding of space, time, and gravity. There are two main parts: special relativity and general relativity. Special relativity, published in 1905, introduced the concept that space and time are interconnected and form a single continuum called spacetime. It posits that the speed of light is constant for all observers, regardless of their motion, leading to phenomena like time dilation, where time slows down for objects moving close to the speed of light, and length contraction, where objects appear shorter in the direction of motion. General relativity, published in 1915, expanded these ideas by introducing gravity as the curvature of spacetime caused by massive objects. This replaced the Newtonian view of gravity as a force acting at a distance. According to general relativity, objects like planets and stars warp spacetime, and this curvature is what we experience as gravity.”

Take a look at the answer. I would suggest that the response generated by the LLM is reasonably valid and provides an illuminating and somewhat detailed answer.

I will try using the same question with a nowadays typical SLM.

  • My entered prompt into SLM: “Can you explain the theory of relativity and how it changed our understanding of space and time?”
  • SLM generated response: “The theory of relativity, developed by Albert Einstein, changed how we understand space, time, and gravity. Special relativity says that the speed of light is always the same, no matter how fast someone is moving. This means that when objects move near the speed of light, time slows down for them (time dilation), and they appear shorter (length contraction). General relativity describes gravity differently from before. Instead of a force, it’s the bending of space and time around large objects, like planets.”

Go ahead and look at that answer by the SLM.

Do you see any differences between the LLM-generated response and the SLM-generated response?

By and large, a typical difference is that SLMs tend to have fewer details in their internal structures and since they aren’t usually actively connected to the web, they don’t conventionally look up additional info (exceptions apply, as I’ll note momentarily). The response by the SLM about the theory of relativity is a bit shallower than the response produced by the LLM. It might also be less timely in terms of whatever is the latest online commentary on whatever topic you are asking about.

Do not though take that to the bank.

I mean to say that there are SLMs that are specifically focused on particular domains or topics, therefore they can potentially outdo a generic LLM that is large and has online access.

Also, some SLMs allow you to tell the AI to go ahead and access the Internet, which I realize seems odd. Isn’t the beauty of the SLM that it can be standalone? Yes, that’s true. At the same time, there isn’t anything that prevents an AI maker from letting you decide to allow online access. If you grant that access, the particular SLM can seek an Internet connection to find more data about the matter at hand.

Generalizations About SLM Versus LLM

If you are willing to keep in mind my mentioned points that advances are underway and that we must be cautious of making overgeneralized claims about SLM versus LLM capabilities, I will go ahead and share with you a broad point-in-time comparison.

Here we go.

  • (1) Performance vs. Efficiency: LLMs tend to have a wide range of language mimicry due to their vastness and can be impressive in what data they have and can discuss, while SLMs typically are compacted, having less data to rely upon. This becomes quickly evident after extensive usage of both. Meanwhile, performance on large servers in the cloud for LLMs makes them pretty quick to respond, though you are competing with millions of other users, while an SLM relies solely on the memory and processing speed of your handheld smart device or laptop.
  • (2) Accuracy vs. Adaptability: LLMs are suitable for open-ended questions, varied problems, and complex tasks, while SLMs tend to be best used if the questions are narrowly within whatever data training the SLM was devised on. Usually, you would ask SLMs easier questions and reserve tougher questions for LLMs, with an exception being that if the SLM is tailored to a niche it might be better than the LLM on answering those niche-related harder questions.
  • (3) Cost vs. Accessibility: LLMs are getting better partially by making them larger and larger, which in turn requires expensive computational processing resources. SLM tends to be less expensive to run since it relies on the processing capabilities of the device being used. Since an SLM is usually designed to primarily work on a standalone basis, you also avoid the Internet connection costs that arise with using LLMs. The lower cost comes with the likely narrower capabilities.
  • (4) Latency vs. Depth of Interaction: If an SLM is well-devised to run on particular standalone devices, the response time can be blazingly fast. The compactness and customization, along with not needing to rely on an online connection, make this possible. But if the SLM is not so devised or runs willy-nilly, the local device might not have enough horsepower, and the wait times could be exasperating. LLMs have the advantage of relying on souped-up servers, though you are competing with perhaps millions of others using that same LLM.
  • (5) User Privacy vs. Cloud Dependency: LLMs that primarily work in the cloud are opening you to potential privacy concerns since your entered prompts flow up into the cloud. In theory, SLM keeps your data local to the device and won’t release it. I say in theory because some SLM makers do nonetheless store your local data and reserve the right to move it up into their cloud for reasons such as improving their SLMs. Do not assume that just because you are using an SLM it is somehow a privacy-saving approach. It might be, it might not be.

As noted, there are tradeoffs between LLMs and SLMs, and you need to decide which is right for which circumstance.

Again, don’t fall into the mental trap of one versus the other and that only one true path exists. You might use one or more SLMs on your smartphone for specific tasks, and at the same time consult with one or more LLMs on the Internet.

You can have your cake and eat it too.

Research On Small Language Models Is Expanding Rapidly

For those of you who enjoy a challenge, I ask you to ponder how to fit ten pounds of rocks into a five-pound bag. I bring up that challenge because the same can be said about trying to devise Small Language Models. You often start with an LLM and try to cut it down to a smaller size. Sometimes that works, sometimes not. Another approach is to start fresh with the realization that you are aiming solely to build an SLM. You aren’t trying to fit an LLM into an SLM.

Almost every AI developer does at least consider what constitutes an LLM and leans into doing likewise for developing SLMs.

You don’t haphazardly toss aside everything already known by having tussled with LLMs all this time. Turns out that LLMs often take a somewhat lackadaisical angle on how the internal data structures are arranged (this made sense in the formative days and often using brute force AI development techniques). You can potentially compact those down greatly. Take out what might be data fluff now that we are being mindful about storage space and processing cycles, maybe change numeric formats to simpler ones, and come up with all kinds of clever trickery that might be applied.

It’s a fantastic challenge.

Not only does solving the challenge help toward building SLMs, but you might as well potentially use those same tactics on LLMs. If you can make LLMs more efficient, you can keep scaling them larger and larger, doing so without correspondingly necessarily having to ramp up the computing resources. You can get more bang for the buck, increasingly so.

There is plenty of research going on about SLMs. For example, a recently posted study entitled “A Survey of Small Language Models” by Chien Van Nguyen and numerous additional co-authors, arXiv, October 25, 2024, made these salient points (excerpts):

  • “Although large language models (LLMs) have demonstrated impressive performance on a wide array of benchmarks and real-world situations, their success comes at a significant cost. LLMs are resource-intensive to train and run, requiring significant compute and data.”
  • “Small Language Models (SLMs) have become increasingly important due to their efficiency and performance to perform various language tasks with minimal computational resources, making them ideal for various settings including on-device, mobile, edge devices, among many others.”
  • “The inherent difficulty of a survey of small language models is that the definitions of ‘small’ and ‘large’ are a function of both context and time.”
  • “We identify important applications, open problems, and challenges of SLMs for future work to address.”
  • “While SLMs present a broad array of benefits, risks, and limitations must also be considered. Hallucination and reinforcement of societal biases are widely recognized risks of large language models.”

Articles such as this cited one are useful starting points for anyone who wishes to venture into the burgeoning arena of SLMs.

Big Thoughts About Small Language Models

Those who are trying to make generative AI bigger and better are indubitably preoccupied with LLMs since the largeness is assumed to be the pathway toward someday attaining the prized Artificial General Intelligence or AGI, see my analysis at the link here. They might scoff at SLMs due to the idea that going smaller is not presumably on that alluring pathway. As I noted earlier, this might be myopic thinking, in that we might be able to get more bang for the buck by using SLM techniques on LLMs.

Moving on, SLMs are currently perceived as the way to get narrowly focused generative AI working on an even wider scale than it is today. It is a potential gold rush of putting the same or similar capabilities onto your smart device and that will be devoted to you and only you (well, kind of).

For example, a devised SLM for aiding people with their mental health would be an appealing application for smaller-sized generative AI. I’ve discussed this at length, see the link here and the link here. The idea is that you could use your smartphone wherever you are to get AI-based therapy, and not require an Internet connection. Plus, assuming that the data is kept locally, you would have enhanced privacy over using a cloud-based system (all else being equal).

SLMs are hot and getting hotter.

I can say the same about LLMs, yes, LLMs are hot and getting hotter.

Those are truisms that aren’t contradictory. Go big can be true while go small is equally true. I fervently hope that we can bring the two camps together. They can equally gain and learn from each other. Splintering our efforts is a sorry shame.

Anyway, a famous quote by the Dalai Lama might be a nice way to conclude this discussion about the value of smallness: “If you think you are too small to make a difference, try sleeping with a mosquito.” LLMs are currently the elephant in the room but keep your eyes on the SLMs too.

You’ll be glad you did and stay tuned for further coverage on SLMs as they grow into a mighty big force.

Share.
Exit mobile version