This is a guest post for the Computer Weekly Developer Network written by Jared Peterson, senior vice president of platform engineering, SAS.
A seasoned technology professional, Peterson’s full title for this piece is: What’s old is new again… and other surprising insights behind small language models.
Peterson writes in full as follows…
You don’t often hear the expression “what’s old is new again” when it comes to technology, a field where almost everything is new, newer, newest.
But then something like small language models (SLMs) comes around and those who’ve been in the business awhile see it for what it is: a concept that’s been in play for quite some time.
SLMs defined
To lay some groundwork, let’s establish that SLMs are the smaller, more specialised siblings of large language models. Because of the way they’re trained and created, they tend to be focused on specific problems or domains. Their scope brings other advantages such as speed of interference or scoring, which often makes them more cost effective and easier to manage. Given these advantages, it’s no surprise that more and more businesses are embracing this alternative to LLMs.
What many don’t realise, however, is that this technology isn’t new.
I spent the early part of my career in the analytics R&D division at SAS working with natural language processing. Those of us who worked on it always saw it as incredible technology that helped businesses quickly analyse large volumes of unstructured data. But the tech never got the attention it deserved.
Attention is All You Need
Then, in 2017, several researchers at Google published the seminal paper, “Attention is All You Need.” This paper introduced the deep learning Transformer model architecture, kicking off the wave of text-based AI innovation that has led us to where we are today. We got things like BERT, XLNet, advancements in speech and audio processing – and, eventually, the GPT (generative pre-trained transformer) series of models.
These days, people use LLMs for natural language processing tasks such as classification, sentiment analysis and named-entity recognition. But many of the model types that came before the current wave of GPT models can perform similar tasks with high levels of accuracy and performance in both speed and costs. These models are now referred to as small language models.
That, of course, brings us full circle. Organisations that were disinterested in natural language processing years ago are now fully embracing SLMs. Thanks for that, ChatGPT.
As we (re)embrace this technology, here are a few more insights for SLMs…
One choice only?
You don’t have to choose between LLMs and SLMs. Since each type of language model has its own benefits, a hybrid approach might be the best option. For example, small language models are more useful for basic tasks, such as answering simple customer questions. But a more complex large language model could be used when customer questions become more complex. Thanks to AI-powered applications, it’s possible to dynamically switch between each type of language model based on what you need it to do.
SAS’ Peterson: Users don’t have to choose between LLMs & SLMs, each has its own benefits, so consider a hybrid approach.
SLMs are more environmentally sustainable than their larger counterparts. Not only do the smaller, more nimble models help you innovate faster, they also have a much smaller carbon footprint. SLMs are trained on less data with fewer computational resources and can easily run on consumer hardware or edge devices without requiring more resource-intensive cloud computing.
You may still encounter bias with SLMs – but it’s more manageable.
Any type of language model that’s trained on an existing dataset will inherit certain biases, and it’s true that with a more limited training scope, responses may be more skewed with SLMs than LLMs. However, training on a smaller dataset might also minimise the hallucinations often found in LLMs.
The best way to minimise bias in SLMs is to ensure your dataset contains a wide range of perspectives – and retrain models when you’ve identified a problem.
What other new technologies might be retro in disguise? Certainly more than we think, since the beauty of innovation is that it often builds on existing ideas. Today’s small language models might be tomorrow’s bigger-and-better advancement and so on.
Our technology changes, because our needs change. And I, for one, am thrilled to be along for the ride.