Can LLMs Outperform Humans At Prompt Engineering?

A recent study found that LLMs are better at prompt engineering than what humans are…

4 min readSep 28, 2023

I’m currently the Chief Evangelist @ HumanFirst. I explore & write about all things at the intersection of AI & language; ranging from LLMs, Chatbots, Voicebots, Development Frameworks, Data-Centric latent spaces & more.

In the image above, zero-shot test accuracy is shown over 24 instruction tasks, and in these tasks, Automatic Prompt Engineering (APE) equaled or surpassed human-level prompts!

The APE process has three inputs:

A basic prompt template.
A dataset of input sentences.
A matching dataset of outputs expected.

The APE process takes the input and the expected output and creates prompts to perform those tasks. Hence the LLM writes the prompts which will produced the given and defined output dataset.

The question is, does APE pave the way, or at least serve as an indicator of future LLM UIs and demise of humans writing prompts?

Imagine a future where prompts are not engineered, but the LLM is supplied with a set of inputs and outputs; each consisting of 10 examples. And the LLM generates instructions (prompt engineering) from comparing the input and output examples; and deducing what needs be achieved.

With Sam Altman stating that the idea of prompt engineering as we currently know it, will not be around in a few years time…and also considering OpenAI’s new approach to fine-tuning where the input and output datasets are defined with a minimum of 10 lines of input and output examples.

At the end of the day human authored prompts are just explicit expressions of intent and there can be implicit intent that can be inferred from other parts of the context.
Marek Zaluski

Another advantage of the LLM generating the prompts, is that human users do not need to experiment with a wide range of prompts to elicit desired behaviours. Whilst not being fully aware of how compatible the instructions are with the referenced model.

The study defined “prompt engineering” as the process of optimising the wording in a prompt to elicit the desired and best response.

It was noted that this does not include prompt chaining or Autonomous Agents making use of external tools.

Considering the image below, the workflow for APE is shown on the left, and on the right is the performance of each model making use of APE, and how close it comes to Human Prompt Engineering.

There is also a case to be made for using APE with a human-in-the-loop approach, where generated prompts are inspected and improved where necessary.

This approach can also be used to optimise prompts for specific LLMs, where a LLM is given the desired input and output and a model specific prompt can be generated.

⭐️ Follow me on LinkedIn for updates on Large Language Models ⭐️

I’m currently the Chief Evangelist @ HumanFirst. I explore & write about all things at the intersection of AI & language; ranging from LLMs, Chatbots, Voicebots, Development Frameworks, Data-Centric latent spaces & more.

HumanFirst — Design, test and launch custom NLU and prompts

HumanFirst makes sense of unstructured data quickly. Pairing human-in-the-loop and AI-powered features, seamlessly…

www.humanfirst.ai

Get an email whenever Cobus Greyling publishes.

Get an email whenever Cobus Greyling publishes. By signing up, you will create a Medium account if you don’t already…

cobusgreyling.medium.com

Automatic Prompt Engineering

Automatic Prompt Engineering (APE) generates optimised prompts for text generation, based on three inputs; the expected…

cobusgreyling.medium.com

What Is The Future Of Prompt Engineering?

The skill of Prompt Engineering has been touted as the ultimate skill of the future. But, will prompt engineering be…

cobusgreyling.medium.com

OpenAI GPT-3.5 Turbo Model Fine-Tuning

This article considers the process, speed and data requirements to create a fine-tuned model. On 22 August 2023 OpenAI…

cobusgreyling.medium.com

Prompt Engineering, Text Generation & Large Language Models

Text Generation Is A Meta Capability Of Large Language Models & Prompt Engineering Is Key To Unlocking It. You cannot…

cobusgreyling.medium.com

Large Language Models Are Human-Level Prompt Engineers

By conditioning on natural language instructions, large language models (LLMs) have displayed impressive capabilities…

arxiv.org

GitHub - keirp/automatic_prompt_engineer

Contribute to keirp/automatic_prompt_engineer development by creating an account on GitHub.

github.com

Can LLMs Outperform Humans At Prompt Engineering?

A recent study found that LLMs are better at prompt engineering than what humans are…

HumanFirst — Design, test and launch custom NLU and prompts

HumanFirst makes sense of unstructured data quickly. Pairing human-in-the-loop and AI-powered features, seamlessly…

Get an email whenever Cobus Greyling publishes.

Get an email whenever Cobus Greyling publishes. By signing up, you will create a Medium account if you don’t already…

Automatic Prompt Engineering

Automatic Prompt Engineering (APE) generates optimised prompts for text generation, based on three inputs; the expected…

What Is The Future Of Prompt Engineering?

The skill of Prompt Engineering has been touted as the ultimate skill of the future. But, will prompt engineering be…

OpenAI GPT-3.5 Turbo Model Fine-Tuning

This article considers the process, speed and data requirements to create a fine-tuned model. On 22 August 2023 OpenAI…

Prompt Engineering, Text Generation & Large Language Models

Text Generation Is A Meta Capability Of Large Language Models & Prompt Engineering Is Key To Unlocking It. You cannot…

Large Language Models Are Human-Level Prompt Engineers

By conditioning on natural language instructions, large language models (LLMs) have displayed impressive capabilities…

GitHub - keirp/automatic_prompt_engineer

Contribute to keirp/automatic_prompt_engineer development by creating an account on GitHub.

Written by Cobus Greyling

No responses yet