HuggingFace Transformers Agent

HuggingFace Transformers Agent offer a natural language API built on transformers and a curated set of tools. An agent interprets user natural language input & use tools to fulfil the user request.

Cobus Greyling
6 min readMay 12, 2023

--

More About Agents

Agents are one of the most exciting developments in the field of LLMs & Conversational AI.

  • Agents receive a request from a user in natural language. This request can be ambiguous in nature and complex; demanding chain-of-thought prompting and reasoning from the Agent.
  • The request is decomposed by the agent into tasks, and the appropriate tool is selected for each task. This is where the agent is really autonomous in selecting from the tools at its disposal.
  • The more tools available to the agent, the more powerful and autonomous the agent is.
  • One or more humans can also be a tool for a human-in-the-loop approach. The agent detects that specific knowledge is required, and recognises a human which can be prompted to fulfil a step in the answering process.
  • Below is a graphic showing the execution steps of an agent, and the tools available to the HuggingFace Agent by default. Custom tools can be defined and added.
  • Considering the HuggingFace agent output below, I do get the sense that the HuggingFace Transformers Agent makes use of Program-Aided Large Language Models approach to some extent. I would love to learn more about my assumption here. 🙂
  • The HuggingFace Transformer Agent is a multi-modal agent, which ads to the notion of Large Language Models being seen as Foundation Models.
  • With natural language as the primary input method, but with output in the form of audio, images, etc. This was also demonstrated with the GPT-4 launch.
  • Prompt Chaining can have one or more Agents as one of its chains.
  • However, Agents creates a chain on the fly, autonomously to fulfil the agent request.
  • Agents act well for use-cases and instances where a predefined prompt chain does not exist; and could not be envisaged.
  • I could not get the HuggingFace agent to answer difficult questions like: What is the square root of the year of birth of the man who is generally regarded as the father of the iPhone. This is something the LangChain Agent excels at.
  • The HuggingFace agent does well at direct requests and subsequently accessing the most appropriate tool for the request.

Agent Considerations

  • Cost; adding multiple tools to an agent which are polled in order to achieve a desired answer can become costly.
  • Latency and round-trip wait time; the time taken for the Agent to respond can be long and latency can be a problem. Here I like how the LangChain agent gives constant feedback on where it is in the process. As seen below, the LangChain agent provides reasons for each step.
  • Hosting, it will be a challenge to ensure enterprise data governance, protection of private information, PII requirements, etc. are all met. Also ensuring hosting is optimal from an operations perspective.

HuggingFace Transformers Agent

HuggingFace Agents are designed to be extensible; with useful tools being identified by HuggingFace; as seen below.

Agents have the unique ability to be augmented with additional tools from other systems or developed by the community.

It is essential to clearly articulate the task you would like to carry out.

For more information on how to craft effective prompts, please refer to this article.

If you would like the agent to remember data and/or objects between executions, you can specify variables that it should use.

For example, you could provide the model with a picture of rivers and lakes and ask it to add an island.

picture = agent.run("Generate a picture of rivers and lakes.")
updated_picture = agent.run("Transform the image in `picture` to add an island to it.", picture=picture)

Lastly, Chat Mode

When Using The ChatGPT API, Users Will Have To Manage The Context.

However, HuggingFace Agent can be used in chat mode. For instance:

agent.chat(“Show me an image of a capybara”)

Chat mode keeps contextual memory across dialog turns/runs and is enhanced for single instructions.

Run mode does not keep memory or context across runs, but yields better performance with running multiple operations at once.

It might make sense for makers to assess the use-case and in some instances it will make sense to manage context on an application level together with user sessions while leveraging the Run mode. This will yield the best performance all round.

I’m currently the Chief Evangelist @ HumanFirst. I explore and write about all things at the intersection of AI and language; ranging from LLMs, Chatbots, Voicebots, Development Frameworks, Data-Centric latent spaces and more.

https://www.linkedin.com/in/cobusgreyling
https://www.linkedin.com/in/cobusgreyling

--

--

Cobus Greyling
Cobus Greyling

Written by Cobus Greyling

I’m passionate about exploring the intersection of AI & language. www.cobusgreyling.com

No responses yet