Download the
Published February 10, 2023. Last updated February 14, 2023.
Read more from The team
Language models have revolutionized the way we interact with knowledge and information. From chatbots to text summarization, their wide range of enterprise applications will enable us to transform the way we work.
However, with so many providers and models available, it can be overwhelming to choose the right approach. Being hasty here could cost you in terms of iteration speed, or potential deals from security-conscious enterprise customers. In this blog, we’ll discuss the two main options for using language models in production, and the respective pros and cons of each option.
Before diving into the options, it’s important to note that not all language models are created equal. Some models are better suited for specific tasks than others, and the quality of predictions can vary greatly depending on the provider. Even though a lot of machine learning research is available open-source, there is great business incentive for these providers to hold on tightly to proprietary techniques, or their “secret sauce”.
To compare the quality of different models, benchmarks like Stanford’s HELM (Holistic Evaluation of Language Models) can be useful. HELM evaluates large language models (LLMs) built by different providers on a common set of tasks and metrics, providing a standardized way to compare performance. Of course, however, the best way to test models is to build your own set of evaluations and metrics that suit your own needs and requirements.
Once you’ve identified the right model for your use case, you have two options for using it in production: either using a closed-source provider’s API, or hosting an open-source model.
Closed-source providers like OpenAI, Cohere, and Anthropic offer access to their language models through subscriptions to their APIs. The process is simple – once you sign up for a provider, they’ll give you access to their API. You’ll then be able to send text to the API and receive a response. Users are typically charged based on the length of input and output.
Pros:
Cons:
Open-source models like HuggingFace BLOOM, Meta LLaMA, and Google Flan-T5 are freely available for anyone to use. However, solutions or companies which host the model for you and provide API-based access (e.g. HuggingFace and Replicate) are very nascent, so you’ll often end up having to host them yourself. The pros and cons of closed-source models are almost (expectedly) reversed if you choose to go with open-source models.
Pros:
Cons:
Each model differs in their number of parameters and tradeoffs. Smaller models are cheaper and easier to manage, but might deliver predictions of poorer quality. It’s why companies often start with closed-source models for testing and iterating on ideas, then transition to open-source or in-house models once those ideas find product-market fit.
Regardless, there’s a model fit for everyone’s use case and needs out there. The field is rapidly advancing, both in terms of technology and business models – so expect only more options to choose from moving forward!
Here at Glean, we use an optimal combination of these approaches to ensure that our users have a great product experience without having to sweat over implementation. To learn more and see Glean in action, sign up for a demo today!