Implementing Large Language Models for Business Improvement: A Step-by-Step Guide
TL;DR: This article talks about how Large Language Models can improve your company process. Its target audience is people with technical backgrounds like software architects or CTO. The article shows the options to use LLM efficiently, you will learn how to use modern techniques like Retrieval Augmented Generation (RAG), function calling and fine-tuning with examples on a use case.
Table of content
· Identifying a Business Need
· Explore an idea by yourself
· Creating an Evaluation Dataset
· Considering Internal Industrialization
· Customizing Responses with Company Data
· Function Calling to use APIs
· Breaking Down Tasks into Multiple Prompts
· Fine-tuning to improve performance
· Combining Model
In the rapidly evolving landscape of technology, Artificial Intelligence (AI) and Machine Learning (ML) have emerged as pivotal forces driving innovation, efficiency, and competitive advantage across industries. For Chief Technology Officers, IT Directors, Tech Project Managers, and Tech Product Managers, understanding and integrating these technologies into business strategies is no longer optional; it’s imperative.
It’s not a surprise, Large language Models (LLMs) like ChatGPT could do more than chat.
We will explore step by step strategies to prevent data distortion, enhance operational efficiency, and better use your company’s resources.
Identifying a Business Need
You already know that Large Language Models (LLMs) such as ChatGPT, Gemini, Mistral, etc… have emerged as powerful tools that can automate tasks and enhance customer service. As a business decision-maker, understanding the capabilities and limitations of LLMs can help you make informed decisions about their implementation.
The first step in leveraging LLMs is to identify a task that can be automated to improve customer service or relieve employees of repetitive tasks. For instance, LLMs can be used to automate information retrieval in documents, write reports, or process customer requests.
Explore an idea by yourself
Once you have identified a business case, the next step is to manually evaluate this with ChatGPT (or Gemini) to estimate if the current reasoning capabilities of generative AI are sufficient to meet the need.
You can create a list of sample inputs and evaluate the accuracy of the responses generated by ChatGPT.
Let’s say you want to automate the dispatching of emails your company receives.
You have to get some emails and test if an online LLM is able to sort them and prepare an answer.
Here is a sample prompt:
Our company is organized into several departments, each with distinct functions and areas of responsibility. Here’s a brief overview:
Sales Department: This team is responsible for generating new business leads, negotiating deals, and maintaining relationships with existing clients.
Marketing Department: This department focuses on promoting our products and services, managing our brand reputation, and conducting market research.
Customer Service Department: This team handles inquiries, complaints, and support requests from our customers.
Human Resources Department: This department manages employee relations, recruitment, training, and benefits.
Technical Support Department: This team provides assistance with technical issues related to our products and services.
Finance Department: This department is responsible for managing the company’s financial resources, including budgeting, forecasting, and financial reporting.
We want you to categorize the following email. I would like you to analyze the content of the email to determine the most suitable department. Here are some guidelines to help you with your analysis:
Look for keywords and phrases that indicate the email is related to sales, marketing, customer service, HR, technical support, or finance.
Consider the tone and context of the email. For example, an email that expresses frustration with a product may be best suited for the customer service department.
If an email contains multiple topics, categorize it based on the most prominent or urgent issue.
Part 3: Once you have analyzed the email and determined the most suitable department, please append your recommendations. Please ensure that your recommendations are accurate and consistent, as this will help us to improve our response times and provide better service to our customers.
Part 4: Write a model of response for the department
Here is the email:
I hope this email finds you well. I am writing to request time off from work for the upcoming holiday season. Specifically, I would like to take the following days off:
Monday, December 20th
Tuesday, December 21st
Wednesday, December 22nd
Thursday, December 23rd
Friday, December 24th
I understand that this is a busy time of year for the company, and I apologize for any inconvenience my absence may cause. However, I have accrued enough vacation time to cover these days, and I believe that taking this time off will allow me to come back refreshed and ready to tackle the new year.
Please let me know if there are any issues with my request, or if you need any additional information from me. Thank you for your time and consideration.
Best regards, [Employee Name]
The model answer:
Based on the content of the email, I would recommend categorizing it under the
__Human Resources Department.__
The email contains keywords such as “time off,” “vacation time,” and “absence,”
which are related to employee relations and benefits, which fall under the
responsibilities of the HR department. The tone of the email is polite and
professional, and the context indicates that the employee is requesting
time off for personal reasons.
__Here is a model response for the HR department:__
Dear [Employee Name],
Thank you for your email and for providing ample notice regarding your time off request for the upcoming holiday season. We appreciate your consideration for the busy time of year and your accrued vacation time to cover your absence.
I am pleased to inform you that your request for time off from Monday, December 20th to Friday, December 24th has been approved. Please ensure that you have communicated your absence to your team and made any necessary arrangements to cover your responsibilities during your time off.
If you have any further questions or concerns, please do not hesitate to reach out to the Human Resources Department.
Thank you for your hard work and dedication to the company. We hope that you have a relaxing and enjoyable holiday season.
[Your Name] Human Resources Department.
⚠️ Don’t hesitate to iterate with your prompting until you get the intended form of response you want to reproduce. This could take some time to get used to it. Don’t stop at first failure, nor first success. Don’t hesitate to restart from scratch. You could even challenge the model to write a prompt for you.
⚠️ Don’t forget to test edge cases, be sure to evaluate at least for 80% of your use case.
Creating an Evaluation Dataset
It’s time to expand your test. Previously you proved that it worked with few examples, now try again with more data to measure the accuracy.
Creating a dataset is essential to evaluate a process. You can use this dataset to test the model, ensuring that it meets your business needs. The dataset should be representative of the real-world scenarios that your business will encounter. Ideally, one should use code to create a reproducible evaluation chain. For example, by calling OpenAI’s API from a list of questions and automatically comparing expected answers.
With a ChatGPT subscription if you look at Explore GPTs you can also try Data Analyst to upload an Excel file and interact with the AI on it.
Compile an Email Dataset: Start by assembling an Excel file containing 100 sample emails that your company might receive.Draft a Detailed Prompt
In this case you can structure your prompt in three segments:
Part 1: Detail the various departments within your company, outlining their specific functions and areas of responsibility.Part 2: Introduce the dataset to the model, instructing it to analyze the content of each email to determine the most suitable department.Part 3: Direct the model to append its recommendations in a new column within your Excel file, effectively categorizing each email.Execute and Evaluate: Utilize the prompt to task the model with identifying the correct recipient department for each email. Following the model’s processing, review its suggestions to assess accuracy and relevance.Screenshot of a sample dataset (AI generated by the autor with Mistral-medium)
Before considering going further you can manually rate each answer and compute the average to evaluate if the result is good enough for this use case. In our example, remember that the use case is a <human>(email) to <machine> (routing & proposed answer) to <human> (department) workflow, so an error can be tolerated : the human could modify the answer, or a department can reroute an email to another… If it happens on ten emails in a hundred it can be good enough.
Considering Internal Industrialization
You can fastrack a production ready solution by using an API provided by an external provider.
You can use OpenAI API or others for your MVP, but there are several factors that you should consider, including:
All the Data you provide to an external API or chat is recorded somewhereYou should anonymize your data even if the service provider claims that it is not using your data…Risk of industrial secret leakage: If you are outside of the US, be aware that OpenAI is subject to the Cloud Act.Speed limitations: It often takes several seconds to obtain a complete response from OpenAI, which may not be fast enough for certain use cases.Call limitations: The number of calls per second are limited, as well as maximum monthly expensesEnvironmental impact: Large generalist models have a significant environmental impact, and this should be taken into account when considering their use.Cost variation: ie OpenAI APIs are subject to cost variation, which can impact your budget.Difficulty in asserting a competitive advantage: It can be challenging to assert a competitive advantage when using OpenAI APIs, as they are available to all businesses.Stability: LLM private models like Gemini, Mistral, Claude2, GPT4 … are not always stable and you should consider monitoring the quality and stability of the answers provided. You also have to add rail guards to protect your service quality and you & your customers from hazardous behaviors coming from in and out. Problems can occur from the input or the output.
To avoid some of these pitfalls, you can turn to open-source models such as LLAMA or Mistral. These open-source alternatives offer several advantages:
Privacy and Security: Self hosted models, reduce the risk of industrial secret leakage.Customization: You can fine-tune open-source models to better suit your specific business needs.Lower Costs: Open-source models are often less expensive than proprietary solutions, especially when considering the limitations on the number of calls and monthly expenses.Environmental Impact: Open-source models are smaller and can be optimized for specific use cases, potentially reducing their environmental footprint. You could measure it with CodeCarbon.Competitive Advantage: By customizing an open-source model, you can create a unique solution that sets your business apart from competitors.
Now you have automated the routing of the email, let’s improve the quality of the answer. A way to do it is to add company documents to the capability of the model. This will allow the model to find answers in your document instead of his “memory”.
Customizing Responses with Company Data
Customizing responses from a LLM with company data will create a more accurate and tailored experience for users.
You can’t send all company data within the prompt. That’s why Retrieval Augmented Generation (RAG) is useful, it’s a technique that combines information retrieval from a database and generation capabilities of a LLM. By using RAG, you can improve the accuracy of responses. And you could tell to the user which documents have been used for the answer.
RAG technique can be simply presented by this formula:
<LLM trained with billion of data> + <Your prompt> + <Your company dataset> = Responses aligned with your context
RAG is often done with a vector database as it works in most cases, here is how to create the database:
Split your documents by shorts chaptersConvert chapters to vectors using an embedding model. The vector on the same subjects will be near in the n-dimensional spaces. Typical vector is an array of 1,024 floats values. Think of it like if each value represents a characteristic, like color, size, gender… It’s not hard coded, the model finds the value by himself in training.Store them in a vector databaseImage by the author
When you receive an email, you will use RAG like this:
Convert the email of your customer to a vectorQuery the database with this vector to retrieve the 10 nearest vectors of paragraphsTake the text of these paragraphs and add them to the promptAsk the LLM for an answerThe answer will be based on the data provided in the promptImage by the author
If you want to learn more, read Retrieval Augmented Generation (RAG)
Now your answer will be using your data, so it helps prevent what is called hallucination.
ℹ️ Model Hallucination is not an easy problem to manage. Because the “memory” of a LLM is more like a human memory (compressed representation of the world) than a computer’s exact memory. And models are trained to help you so they will try to, even when they don’t know the answer, misleading information will be presented as fact. RAG helps cope with this problem by providing relevant data to the model.
RAG is really good for unstructured data, but sometimes you have a better way to answer the question like tabular data with pricing for each product, or you may even want to compute taxes, or looking for a slot in an agenda to arrange a meeting. Let’s see how to do that with function calling.
Function Calling to use APIs
Function calling is a way to allow interaction between a LLM and your enterprise API, like:
Salesforce, SAP for your ERPService Now or other ticketing servicesAgendasInvoice, pricingCustom API to do anything in your companyThird party API
Function calling is an essential feature that allows you to use APIs without exposing them to the outside world. This feature opens up many possibilities beyond simple chat applications. For instance, you can integrate specialized internal services or tools into the LLM, making it more versatile and valuable for your business. You can take a mail from a customer requesting a price, send it to the LLM to turn it into a parameter to call your pricing API, then use the API answer to ask the LLM back to write the answer to the customer.
Given the request:
I really like your company. I would like to order you a solar panel mounting rail, what would be the price ?
You send the request to the LLM, with the definitions of the API that exist in your company:
“description”: “Get product price.”,
“description”: “Name of the product”,
“required”: [ “product_name” ]
So the LLM extract the product name from the mail and give you the JSON to make the API call:
“product_name”: “solar panel mounting rail”
It’s up to you to call the API, so it is totally secured : the LLM never knows where your API is, just what it can do.
The answer of the API could be sent back to the LLM to build a natural language answer.
Can you answer this email given that the price for a “solar panel mounting rail” is $10 without a VAT of 5% ? “Hello, I really like your company. I would like to order you a solar panel mounting rail, what would be the price ? Best regards Your customer “
The answer will be:
Hello,Thank you for your interest in our company and for considering our solar panel mounting rail. The price for the mounting rail is 10 before taxes, with a VAT of 5%, so $10.50 taxes included.Please let me know if you have any other questions or if you would like to proceed with the order.Best regards,
So you now have a system that can use your internal services to better prepare answers for your customers. That’s a game changer if you have already invested in APIs.
We just saw that we may call a LLM more than once for a single task, let see that in more detail.
Breaking Down Tasks into Multiple Prompts
It’s important to note that a single prompt is often not enough for complex tasks. Your project will likely require breaking down the task into multiple prompts that will chain together and combine several techniques.
For exemple https://360learning.com/ build a platform to help building online courses with AI from a single text document as input. Their pipelines make use of 9 prompts, used for 30 OpenAI calls, and RAG to achieve their goal. A first prompt asks for a resume of the document, a second asks for a plan for an online course from the resume, then RAG is used to retrieve each part of the document from the title, and so on.
Here is some slides of their presentation:
Video source : https://www.youtube.com/watch?v=1Eyc2GypnF4 (in French)
They are using LangChain, a framework that helps to create these types of LLM pipelines.
ℹ️ You probably heard of “AI Agents”: they are just a way to combine prompts, but without writing them in advance. An agent is a call to a LLM to get a list of tasks. Then, make a call to LLM for each task, and so on. It works best with giving the ability to the LLM to call external tools like browsing the web using functions like we saw before.
Now you have a powerful pipeline, but how to improve the model itself to have faster and better answers ? You can fine tune a model.
Fine-tuning to improve performance
Very few companies could afford to train a LLM from scratch because it requires a huge dataset and hundreds of GPUs, almost 2 millions GPU hours for Llama2–70B for example. But you can take an already pre-trained model and fine-tune it, only an afternoon of fine-tuning is needed in most cases.
The drawback is that you have to build a training dataset with hundreds of questions and answers.
It’s a new technique to combine multiple models in one. The result is a big model, called Mixture of Experts (MoE), with better capabilities than a single of the same size. The easiest way to do that is with MergeKit.
Generated with AI — Bing Copilot — “An image of a mathematician, a physicist and a mechanical engineer working on the same problem around a desk featuring a dismantled uav”
This could help you if it’s difficult to decide which model to use : with MoE, it’s the model who decides which one to use.
Customizing responses from LLMs with company data and API create a more accurate and tailored experience for users. Fine-tuning can improve the performance, and breaking down tasks into multiple prompts can help tackle complex tasks.
While all of this may seem complex and reserved for specialists, abundant documentation and numerous libraries are available to facilitate implementation. Popular libraries include HuggingFace, Langchain, HayStack, Axolotl and so on…
However, don’t forget the cost of integration. As with any project, there is a significant cost associated with moving from a functional prototype to a fully industrialized solution within an existing IT system. You will often discover that the process of your company is more complex than expected. Or that the data needs a bit of cleaning to be processed.
While large language models offer many advantages, don’t neglect the benefits of “older” machine learning techniques like random forest or DistiliBert. These techniques can still provide values, including faster processing, easier integration into existing tools, no need for GPUs, better explainability, and lower costs.
We hope that this article provides a view on how to include LLM in your software architecture.
Article written in february 2024 by Benoît Courty, data scientist, with the help of Stéphane Van-Bosterhaudt, CEO of UpScale.
More readings on the subject:
Cloud Act: https://en.wikipedia.org/wiki/CLOUD_ActRAG: https://blogs.nvidia.com/blog/what-is-retrieval-augmented-generation/Embeddings: https://towardsdatascience.com/neural-network-embeddings-explained-4d028e6f0526Function calling: https://platform.openai.com/docs/guides/function-callingFine-tuning: https://mlabonne.github.io/blog/posts/A_Beginners_Guide_to_LLM_Finetuning.htmlMixture of Experts: https://medium.aiplanet.com/create-your-own-mixture-of-experts-model-with-mergekit-and-runpod-8b3e91fb027a
Leveraging Large Language Models for Business Efficiency was originally published in Towards Data Science on Medium, where people are continuing the conversation by highlighting and responding to this story.