MMS • Anthony Alford
OpenAI recently published a guide to Prompt Engineering. The guide lists six strategies for eliciting better responses from their GPT models, with a particular focus on examples for their latest version, GPT-4.
The guide’s six high-level strategies are: write clear instructions, provide reference text, split complex tasks into simpler subtasks, give the model time to “think”, use external tools, and test changes systematically. Each of the strategies is broken down into a set of specific, actionable tactics with example prompts. Many of the tactics are based on results of LLM research, such as chain-of-thought prompting or recursive summarization.
OpenAI’s research paper on GPT-3, published in 2020, showed how the model could perform a variety of natural language processing (NLP) tasks using few shot learning; essentially, by prompting the model with a description or examples of the task to be performed. In 2022, OpenAI published a cookbook article which contained several “techniques for improving reliability” of GPT-3’s responses. Some of these, such as giving clear instructions and breaking up complex tasks, are still included in the new guide. The older cookbook guide also contains a bibliography of research papers supporting their techniques.
Several of the guide’s tactics make use of the Chat API’s system message. According to OpenAI’s documentation, this parameter “helps set the behavior of the assistant.” One tactic suggests using it to give the model a persona for shaping its responses. Another suggests using it to pass the model a summary of a long conversation, or to give a set of instructions that are to be repeated for multiple user inputs.
The strategy of use external tools gives tips on interfacing the GPT model with other systems, with pointers to articles in OpenAI’s cookbook. One of the tactics suggests that instead of asking the model to perform math calculations itself, it should instead generate Python code to do the calculation; the code would then be extracted from the model response and executed. The guide does, however, contain a disclaimer that the code the model produces is not guaranteed to be safe, and should only be executed in a sandbox.
Another strategy in the guide, test changes systematically, deals with the problem of deciding if a different prompt actually results in better or worse output. This strategy suggests using the OpenAI Evals framework, which InfoQ covered along with the release of GPT-4. The strategy also suggests using the model to check its own work “with reference to gold-standard answers,” via the system message.
In a Hacker News discussion about the guide, one user said:
I’ve been hesitant lately to dedicate a lot of time to learning how to perfect prompts. It appears every new version, not to mention different LLMs, responds differently. With the rapid advancement we’re seeing, in two years or five, we might not even need such complex prompting as systems get smarter.
Several other LLM providers have also released prompt engineering tips. Microsoft Azure, which provides access to GPT models as a service, has a list of techniques similar to OpenAI’s; their guide also provides tips on setting model parameters such as temperature and top_p, which control the randomness of the model’s output generation. Google’s Gemini API documentation contains several prompt design strategies as well as suggestions for the top_p and temperature values.