This is a guest post for the Computer Weekly Developer Network written by Ivo Patty, lead data engineer at digital transformation specialist organisation Qodea.
Patty writes in full as follows…
Ivo currently works as a Data Engineer within Cloud Technology Solutions. He specialises in Google Cloud platform implementations of data products, helping Qodea customers modernise their infrastructure and get from data to insights.
The AI race has to date been defined by scale – the larger the parameter-set the better – with Large language models (LLMs) dominating. However, as AI maturity has evolved, there is a growing acceptance that while size does matter, bigger isn’t always better.
As businesses deepen their understanding around how different AI models can be applied, Small Language Models (SLM) – which can be used for more task-specific use cases – are gaining traction. And big tech are getting on board. From Google’s Gemma to Open AI’s o3 mini, the next AI battlefront will be in the development of SLMs.
The real breakthrough with SLMs, however, isn’t that they can replace LLMs; it’s that they complement them. The future of AI isn’t about choosing one over the other – it’s about multi-model AI strategies, where different models work together to create more efficient, scalable and cost-effective solutions.
Bigger isn’t always better
LLMs have dominated the spotlight in recent years. Take models like Gemini and ChatGPT – everyone’s heard of them and everyone wants to use them. In just three years, a third of work experiences are expected to be powered by conversational AI, whilst 80% of companies will be fronted by digital consumers and virtual agents.
LLMs excel at processing huge amounts of data, generating human-like content and providing intelligent responses to queries. From powering chatbots to assisting clinicians in reviewing medical literature, these models are transforming how we work, communicate and innovate.
However, LLMs aren’t without their challenges. They’re resource-hungry, requiring significant computational power and energy. This can have a huge environmental impact. Training a large language model like GPT-3 for example, is estimated to use just under 1,300 megawatt hours of electricity; about as much power as consumed annually by 130 US homes.
They also require rigorous data management to avoid the leaking of sensitive information. Then comes AI hallucinations – instances where a model generates convincing but entirely false conclusions. While some mistakes are harmless, others pose serious risks, from outputting misinformation that influences public discourse, to producing inaccurate medical insights that could lead to misdiagnoses.
Fast, but not flawless
SLMs are the new contenders in AI. These models are fast gaining traction, with Gartner estimating at least 50% of enterprises have actively looked at an SLM for their use cases in the last six to twelve months.
SLMs, often 10x-100x smaller than LLMs, are designed for specialised tasks that don’t require huge datasets. Their compact size and ultra-low latency make them ideal for AI on mobile, edge devices and real-time apps. SLMs can generate responses in milliseconds rather than seconds: making them crucial for applications like live customer support, real-time transactions, or interactive experiences.
Thanks to their focus on specific data, SLMs can provide more accurate, context-specific answers, reducing the risk of misleading responses. And unlike LLMs that rely on public data, SLMs are easier to tailor for proprietary use: giving organisations control over the algorithm, training and data.
For example, consider integrating a model on a PoS system. We do not need 80B parameters to process ‘double cheeseburger with extra bacon’. We can use a smaller size model and use it to generate simple responses and make simple decisions. This is not only faster for the user of our system but also cheaper and consumes a lot less power.
However, SLMs have limitations. Their smaller size limits their ability to capture and retain complex patterns and context across large data sets. This constrained understanding means they may struggle with complicated or niche topics, leading to incoherent or inaccurate responses. Fine-tuning improves accuracy, but the process is complex and requires scarce expertise.
The multi-model future of AI
SLMs are great at what they do. But there still exist use-cases where an SLM wouldn’t be able to keep up with the processing ability of an LLM. For example, in legal document analysis, an LLM can handle vast amounts of client data and complex language, providing nuanced insights, while an SLM might miss broader legal contexts. On the other hand, LLMs can be overkill for simpler tasks like routine system interactions, where an SLM thrives.
Where does this leave us? The rise of SLMs doesn’t spell the end for LLMs. Both have their strengths. Instead, it presents an exciting opportunity for developers to make smarter choices, carefully selecting the right model for the right use case. As we move forward, multi-model AI strategies will become the norm – where smaller, specialised models work in tandem with larger, more powerful ones.
Yet to make multi-model a reality, developers will need to consider how they build and manage a multi-model AI pipeline. For this, developers will need a robust infrastructure that supports high computational power, efficient data handling and seamless model deployment. A flexible machine learning platform – such as Google Vertex AI – that supports training, deployment and orchestration of models of all sizes will be essential.
Finding the right fit
In the end, it’s not about choosing between SLM and LLMs. It’s about leveraging the strengths of both to create smarter, more efficient AI systems. Developers who embrace this hybrid approach will not only stay ahead of the curve, but be uniquely positioned to deliver tailored solutions that meet diverse use cases.