Skip to main content

In this blog, I explore the notion of “fit for purpose” in the context of large language models (LLMs). When you have a big hammer, as they say, everything is a nail, but perhaps there are better ways to approach how to engineer and deploy language models for business uses. The following two stories explore this idea.

Smaller models

A Computerworld article from a few weeks ago caught my attention on this front. It proclaimed that LLMs need to shrink – particularly to satisfy business use cases. One of the popular approaches to making these models appropriate for corporate needs is to fine tune the model with specific datasets. But this process can be inordinately expensive, especially the larger the starting model size. With models skirting into the trillion parameter range (allegedly the GPT-4 size), such approaches can give anyone pause. To be fair, mega models are continuing to improve and continue to produce impressive results in a range of general purpose tasks. However, the training cost and the carbon footprint involved can be quite massive. In this context, it is interesting to note that even Open AI is finding this process expensive! There is a supply chain problem with accessing GPUs.

There have been a number of approaches over the years to get smaller models to perform as well as their larger brethren when it comes to specific tasks and domains. One technique is something called knowledge distillation. Here the idea is to use a LLM (teacher) to train a smaller LM (student) and this approach was surprisingly effective. But this was four years ago, almost stone age compared to what we have now! Another approach, popularized by DeepMind, is the model they dubbed Chinchilla. At 70 billion parameters, it was still large but performed as well as or better than much larger models. The secret sauce was training smaller models for longer, with a lot more data – trillions of data points. Another approach involves distilling specific capabilities from larger models to smaller models but focusing on specific capabilities. 

This article targeting developers, argues for deploying small LMs for targeted company specific use cases. Examples include customer service automation and sales and marketing automation. You can train a small model from scratch or start with an open sourced pre-trained smaller model and fine tune it with company resources. The intent is to train the smaller model with less security risk, providing control over one’s data and easier maintenance and deployment. Of course, one needs the right talent to pull off such an endeavor.

Retrieval augmented language models (RALM)

I recently read a blog post published by Meta researchers exploring a new approach for RALM. It turns out RALM is an active area of research for a variety of reasons – some of them are covered in the story above. Namely, LLMs cannot currently handle queries related to corporate data. Also, LLM pre-training always has some end date. If you ask GPT-4 now, it says its training cut-off date is January 2022. To address these drawbacks, the approach is to augment the querying or prompting with data retrieved from an arbitrarily large datastore. 

A recent, excellent tutorial conducted by the Association for Computational Linguistics (ACL) provides a detailed overview of the different approaches to addressing what to retrieve, when to retrieve and how to incorporate the results of the retrieval. 

One popular open-source tool to help with this process is Langchain. The basic theme followed in supporting an external datastore is to provide query context to a LLM. The idea is to first represent the content of a large datastore in some standard format – usually involving an embedder – to encode every element with a vector representation. Such a datastore is called a vector database. Given a query you want an LLM to answer, the query is used to retrieve a collection of elements from the datastore that are most relevant (nearest in vector space). Using this context, LLMs respond to the query, but there is more to this than meets the eye. LLMs can be fine tuned to incorporate the retriever, the retriever can be fine tuned to work with LLMs. A research paper released this month from Meta attempts to do both efficiently (13) – the new version: RA-DIT – retrieval-augmented dual instruction tuning.

Fit for purpose

Both above trends, utilizing smaller models trained on corporate data and utilizing LLMs with retrieval-augmentation, are working toward developing business focused use of LLMs. These approaches democratize the development and deployment of LLM-based applications.

I am always looking for feedback and if you would like me to cover a story, please let me know! Leave me a comment below or ask a question on my blogger profile page.

“Juggy” Jagannathan, PhD, is an AI evangelist with four decades of experience in AI and computer science research.