Interacting with Large Language Models
Enriching prompts to steer the model — explained for non-experts
Large language models like GPT show astonishing capabilities for solving complex tasks like translation, reasoning, or answering questions. However, to arrive there, the models have to be queried in the correct way; a technique called prompting. Understanding how to interact with a language model is crucial to the most out of it, and writing decent prompts can be a great challenge.
In this post, I want to give an overview of different techniques of prompting, that allow language models to solve tasks more precisely. I will first explain the basic functionality of language models, before introducing some of the most-used prompting techniques, namely zero-shot and few-shot prompting. Later I will explain chain of thought prompting, a mechanism to enhance the capabilities of language models, and show, how those can be combined with the usage of external tools to solve more complex tasks.
Language models & prompting
The fundamentals of language models can be explained in a single sentence: A language model is a model that predicts the next word given a context of words. That is, given a sentence like “The monkey ate the…”, the language model computes the probabilities for different words following. In this example, the word “banana” might have a very high probability, “apple” might also be quite likely, but “piano” has a rather low probability. Hence the model might finish the sentence as “The monkey ate the banana”.
There are different kinds of models that estimate those probabilities in different ways, each having their unique advantages and disadvantages. If you want to see some of the most important technologies over time, you can take a look at this overview. For understanding this article, you just need to know that a language model always predicts the next words given a sequence of words.
In order to interact with a language model, you use a prompt. A prompt is just the text you give to a model to continue it. So in the example above, “The monkey ate the banana” is the prompt. The prompt is not limited to single sentences though. It can be much longer, including multiple sentences or paragraphs. For technical reasons, there needs to be a limit regarding its length, which is often in the area of hundreds or thousands of words. Depending on your prompt you can make the model solve different tasks, like answering questions, translating to other languages, and much more. Let’s discover some ways to create meaningful prompts that tell the model what you expect from it.
The simplest way to use a large language model may be zero-shot prompting. The main idea is to construct a prompt in such a format, that completing the prompt gives the desired answer to the question. A simple example:
The president of the United States is …
When the language model completes this sentence by predicting the next word(s), it may create the sentence “The president of the United States is Joe Biden” and has answered the question by doing that.
This approach has its limits though, as sometimes it is quite hard to formulate a question clearly and unequivocal in this structure. E.g. if we take the sentence
Joe Biden is …
we may expect the model to complete the sentence to “Joe Biden is the president of the United States”. However, there are many other possible continuations of this sentence, that are also reasonable, like “Joe Biden is a politician of the democratic party”. That is a true statement, and given our prompt there is no way to tell whether the one or the other answer fits our expectations better. So, how can we formulate the prompt more clearly?
In the prompt, we can give a few examples of what we expect; a technique that is called few-shot prompting. Think of the following example:
Justin Trudeau: CanadaRishi Sunak: Great BritainOlaf Scholz: GermanyJoe Biden:
Obviously, the pattern encoded in this prompt is the linkage between politicians and the country in which they are the head of government. Given this prompt, we would expect the model to continue with “Joe Biden: United States of America”. In another prompt, that is using the same format, we may focus on a different aspect though:
Justin Trudeau: 51Rishi Sunak: 43Olaf Scholz: 64Joe Biden:
Here we talk about the politician’s ages, and hence expect a continuation like “Joe Biden: 80”.
As mentioned, this is called few-shot prompting, because you provide a few examples (shots) that tell the model what to do. This helps to describe the desired task in more detail and therefore leads to more precise answers from the language model.
Chain of thought
So far, our prompts included only the task we wanted the model to answer. However, we can also use the prompt to give the model additional guidance, highlight important aspects, or restrict its answers. For example, to avoid wrong answers, we could start our prompt with
Answer the following question. If you don’t have the required information, answer with ‘I don’t know’.
followed by the few-shot examples. This points the model to admitting that it doesn’t know the answer instead of making up a wrong answer on its own.
Another useful extension of the prompt is known as chain of thought prompting. The main idea is to encourage the model to explicitly verbalize the intermediate steps it has to perform in order to complete its task. We do that by simply providing few-shot examples that include those steps or reasoning traces, which is called a thought. As you might guess, this idea is inspired by the way humans think when they split a task into subtasks and reason over them. Say you have a task like the following:
Q: Sally has bought 18 apples to make apple pie. She baked 2 pies and has 4 apples left. How many apples are in each pie?
What would you do to arrive at the answer? I bet you first figured that she used 14 apples (18 she bought minus 4 she has left) and then divide the 14 apples by two pies to arrive at 7 apples per pie. This might sound trivial to you, but have you ever taught a child to solve such tasks? If you just provide it with the task and the answer (7), it will not understand how you arrived at that answer. However, if you verbalize the steps you did, the child can comprehend and transfer them to other tasks, if it is able to perform the required arithmetic operations. Likewise, when prompting a language model, you can include the thoughts in the few-shot examples. For the task above, an example could look like this:
Q: Sally has bought 18 apples to make apple pie. She baked 2 pie and has 4 apples left. How many apples are in each pie?Thought: If she has 4 apples left, she has used 18–4=14 apples. If she divided 14 apples on two pies, each pie has 14/2=7 apples in it.A: Each pie contains 7 apples.
If all the examples look like this, the model will not generate answers directly, but it will generate the thoughts before, which has two big advantages:
First, it leads to better answers. As the model is forced to do one step after another, it can utilize more resources for the subtasks each, instead of having to solve the whole task at once. When you were to do a complex calculation you would also note down intermediate results, wouldn’t you?
Second, it helps understand the model’s prediction and the errors it does. If you just get a wrong answer to your question, it is hard to find out why the model misbehaved. The thoughts, however, can give you meaningful insides into the reasoning traces the model did, which allows you to understand the cause of failure in more detail. Say the model produces the following thought:
If she has 4 apples left, she has used 18–4=14 apples. If she divided 14 apples on two pies, each pie has 14/2=9 apples in it.
In this example, you can figure out, that the model did the correct steps, but just miscalculated in the last step. In other cases, you might see, that the model didn’t understand the task, which might encourage you to reformulate it.
With the chain of thought, we have given the model the ability to split complex tasks into easier subtasks and solve those in successive steps. However, those subtasks the model had to fulfill by itself using only the skills it has already. If it is not capable of solving any of the tasks, the whole chain of thought ends, even if the chain itself is correct. This limits the tasks it can fulfill drastically. You could compare that to tasks you solve as a human with only the knowledge you have in memory. The example above you are able to solve by using basic math skills you know by heart, but most problems in the real world are not that easy. Say you want to answer the following question:
Who is older? Andrew Scott or Benedict Cumberbatch?
It is quite easy to determine the steps you have to do in order to answer the question. First, you have to research Andrew Scott’s and Benedict Cumberbatch’s days of birth, then you check which was earlier. However, you wouldn’t be able to do that just by thinking about it, because you don’t know their birthdays by heart (except if you are the biggest Sherlock fan on earth). For the actual execution of the tasks, you need help, e.g. by consulting Wikipedia.
Comparably, a language model becomes much more skillful if you equip it with additional tools it can use. Frameworks like Langchain allow to do exactly that. Say you provided the Wikipedia API, then you could construct some few-shot examples that utilize this API:
Q: Who is older? Andrew Scott or Benedict Cumberbatch?Thought: I need to find out Andrew Scott’s day of birth.Action: query_wikipedia(search_term=’Andrew Scott’)Observation: “Andrew Scott (born 21 October 1976) is an Irish actor…”Thought: I know Andrew Scott’s day of birth. Now I need to find out Benedict Cumberbatch’s day of birth.Action: query_wikipedia(search_term=’Benedict Cumberbatch’)Observation: “Benedict Timothy Carlton Cumberbatch CBE (born 19 July 1976) is an English actor…”Thought: I know that Andrew Scott was born 21 October 1976 and Benedict Cumberbatch was born 19 July 1976. Cumberbatch’s day of birth is earlier. Hence he is older.A: Benedict Cumberbatch is older than Andrew Scott.
As you see, a thought is always followed by an action. This action is executed and leads to an observation, that influences the next thought. From examples like this, the model learns to perform actions that lead to results that are required on the way to solving the initial task. The action you equip the models with can range from retrieving data with API calls to performing changes on a database or opening your remote-controlled window. The model could even ask another AI to complete some subtasks for it. As you see, this heavily extends the tasks the model can solve, because it opens new ways to solve subtasks that go beyond the scope of language alone.
In this post, I explained how you can use prompts to formulate tasks for a language model and even empower it with more skills and capabilities, without having to retrain the model at all. The main ideas that lead to more sophisticated and powerful prompts can be summarized as follows:
Simple zero-shot prompts only formulate a task for a language model.Few-shot examples can specify the task in more detail and provide the model with examples of how to solve it.Encouraging the model to produce a chain of thought allows it to solve complex tasks and introduces an intuitive way of letting the model explain itself.Adding actions equips the models with tools that can solve subtasks the model can’t solve on its own, and hence allows for more complex tasks.
Equipping language models with tools doesn’t have to be the end of the line, though. A natural next step could be to foster the automaticity of the model, s.t. it pursues a goal by selecting the relevant tasks, as Baby AGI is doing it. What other next steps can you think of?
The following papers introduce the main concepts highlighted in this article.
Few-shot prompting is one of the core fundamentals in the GPT3 model:
Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J. D., Dhariwal, P., … & Amodei, D. (2020). Language models are few-shot learners. Advances in neural information processing systems, 33, 1877–1901.
Chain of thought reasoning has been introduced in this paper:
Wei, J., Wang, X., Schuurmans, D., Bosma, M., Chi, E., Le, Q., & Zhou, D. (2022). Chain of thought prompting elicits reasoning in large language models. arXiv preprint arXiv:2201.11903.
ReAct is a prominent approach combining chain of thought reasoning with actions:
Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., & Cao, Y. (2022). React: Synergizing reasoning and acting in language models. arXiv preprint arXiv:2210.03629.
Like this article? Follow me to be notified of my future posts.