Agentic Discovery

Web-Navigating AI Agents: Redefining Online Interactions and Shaping the Future of Autonomous Exploration.

6 min readAug 30, 2024

--

Introduction

What are AI agents or Agentic Applications? Well, this is the best definition I could come up with:

๐˜ˆ๐˜ฏ ๐˜ˆ๐˜ ๐˜ˆ๐˜จ๐˜ฆ๐˜ฏ๐˜ต ๐˜ช๐˜ด ๐˜ข ๐˜ด๐˜ฐ๐˜ง๐˜ต๐˜ธ๐˜ข๐˜ณ๐˜ฆ ๐˜ฑ๐˜ณ๐˜ฐ๐˜จ๐˜ณ๐˜ข๐˜ฎ ๐˜ฅ๐˜ฆ๐˜ด๐˜ช๐˜จ๐˜ฏ๐˜ฆ๐˜ฅ ๐˜ต๐˜ฐ ๐˜ฑ๐˜ฆ๐˜ณ๐˜ง๐˜ฐ๐˜ณ๐˜ฎ ๐˜ต๐˜ข๐˜ด๐˜ฌ๐˜ด ๐˜ฐ๐˜ณ ๐˜ฎ๐˜ข๐˜ฌ๐˜ฆ ๐˜ฅ๐˜ฆ๐˜ค๐˜ช๐˜ด๐˜ช๐˜ฐ๐˜ฏ๐˜ด ๐˜ข๐˜ถ๐˜ต๐˜ฐ๐˜ฏ๐˜ฐ๐˜ฎ๐˜ฐ๐˜ถ๐˜ด๐˜ญ๐˜บ ๐˜ฃ๐˜ข๐˜ด๐˜ฆ๐˜ฅ ๐˜ฐ๐˜ฏ ๐˜ต๐˜ฉ๐˜ฆ ๐˜ต๐˜ฐ๐˜ฐ๐˜ญ๐˜ด ๐˜ต๐˜ฉ๐˜ข๐˜ต ๐˜ข๐˜ณ๐˜ฆ ๐˜ข๐˜ท๐˜ข๐˜ช๐˜ญ๐˜ข๐˜ฃ๐˜ญ๐˜ฆ.

๐˜ˆ๐˜ด ๐˜ด๐˜ฉ๐˜ฐ๐˜ธ๐˜ฏ ๐˜ช๐˜ฏ ๐˜ต๐˜ฉ๐˜ฆ ๐˜ช๐˜ฎ๐˜ข๐˜จ๐˜ฆ ๐˜ฃ๐˜ฆ๐˜ญ๐˜ฐ๐˜ธ, ๐˜ข๐˜จ๐˜ฆ๐˜ฏ๐˜ต๐˜ด ๐˜ณ๐˜ฆ๐˜ญ๐˜บ ๐˜ฐ๐˜ฏ ๐˜ฐ๐˜ฏ๐˜ฆ ๐˜ฐ๐˜ณ ๐˜ฎ๐˜ฐ๐˜ณ๐˜ฆ ๐˜“๐˜ข๐˜ณ๐˜จ๐˜ฆ ๐˜“๐˜ข๐˜ฏ๐˜จ๐˜ถ๐˜ข๐˜จ๐˜ฆ ๐˜”๐˜ฐ๐˜ฅ๐˜ฆ๐˜ญ๐˜ด ๐˜ฐ๐˜ณ ๐˜๐˜ฐ๐˜ถ๐˜ฏ๐˜ฅ๐˜ข๐˜ต๐˜ช๐˜ฐ๐˜ฏ ๐˜”๐˜ฐ๐˜ฅ๐˜ฆ๐˜ญ๐˜ด ๐˜ต๐˜ฐ ๐˜ฃ๐˜ณ๐˜ฆ๐˜ข๐˜ฌ ๐˜ฅ๐˜ฐ๐˜ธ๐˜ฏ ๐˜ค๐˜ฐ๐˜ฎ๐˜ฑ๐˜ญ๐˜ฆ๐˜น ๐˜ต๐˜ข๐˜ด๐˜ฌ๐˜ด ๐˜ช๐˜ฏ๐˜ต๐˜ฐ ๐˜ฎ๐˜ข๐˜ฏ๐˜ข๐˜จ๐˜ฆ๐˜ข๐˜ฃ๐˜ญ๐˜ฆ ๐˜ด๐˜ถ๐˜ฃ-๐˜ต๐˜ข๐˜ด๐˜ฌ๐˜ด.

๐˜›๐˜ฉ๐˜ฆ๐˜ด๐˜ฆ ๐˜ด๐˜ถ๐˜ฃ-๐˜ต๐˜ข๐˜ด๐˜ฌ๐˜ด ๐˜ข๐˜ณ๐˜ฆ ๐˜ฐ๐˜ณ๐˜จ๐˜ข๐˜ฏ๐˜ช๐˜ด๐˜ฆ๐˜ฅ ๐˜ช๐˜ฏ๐˜ต๐˜ฐ ๐˜ข ๐˜ด๐˜ฆ๐˜ฒ๐˜ถ๐˜ฆ๐˜ฏ๐˜ค๐˜ฆ ๐˜ฐ๐˜ง ๐˜ข๐˜ค๐˜ต๐˜ช๐˜ฐ๐˜ฏ๐˜ด ๐˜ต๐˜ฉ๐˜ข๐˜ต ๐˜ต๐˜ฉ๐˜ฆ ๐˜ข๐˜จ๐˜ฆ๐˜ฏ๐˜ต ๐˜ค๐˜ข๐˜ฏ ๐˜ฆ๐˜น๐˜ฆ๐˜ค๐˜ถ๐˜ต๐˜ฆ.

๐˜›๐˜ฉ๐˜ฆ ๐˜ข๐˜จ๐˜ฆ๐˜ฏ๐˜ต ๐˜ข๐˜ญ๐˜ด๐˜ฐ ๐˜ฉ๐˜ข๐˜ด ๐˜ข๐˜ค๐˜ค๐˜ฆ๐˜ด๐˜ด ๐˜ต๐˜ฐ ๐˜ข ๐˜ด๐˜ฆ๐˜ต ๐˜ฐ๐˜ง ๐˜ฅ๐˜ฆ๐˜ง๐˜ช๐˜ฏ๐˜ฆ๐˜ฅ ๐˜ต๐˜ฐ๐˜ฐ๐˜ญ๐˜ด, ๐˜ฆ๐˜ข๐˜ค๐˜ฉ ๐˜ธ๐˜ช๐˜ต๐˜ฉ ๐˜ข ๐˜ฅ๐˜ฆ๐˜ด๐˜ค๐˜ณ๐˜ช๐˜ฑ๐˜ต๐˜ช๐˜ฐ๐˜ฏ ๐˜ต๐˜ฉ๐˜ข๐˜ต ๐˜ฉ๐˜ฆ๐˜ญ๐˜ฑ๐˜ด ๐˜ช๐˜ต ๐˜ฅ๐˜ฆ๐˜ต๐˜ฆ๐˜ณ๐˜ฎ๐˜ช๐˜ฏ๐˜ฆ ๐˜ธ๐˜ฉ๐˜ฆ๐˜ฏ ๐˜ข๐˜ฏ๐˜ฅ ๐˜ฉ๐˜ฐ๐˜ธ ๐˜ต๐˜ฐ ๐˜ถ๐˜ด๐˜ฆ ๐˜ต๐˜ฉ๐˜ฆ๐˜ด๐˜ฆ ๐˜ต๐˜ฐ๐˜ฐ๐˜ญ๐˜ด ๐˜ช๐˜ฏ ๐˜ด๐˜ฆ๐˜ฒ๐˜ถ๐˜ฆ๐˜ฏ๐˜ค๐˜ฆ ๐˜ต๐˜ฐ ๐˜ข๐˜ฅ๐˜ฅ๐˜ณ๐˜ฆ๐˜ด๐˜ด ๐˜ค๐˜ฉ๐˜ข๐˜ญ๐˜ญ๐˜ฆ๐˜ฏ๐˜จ๐˜ฆ๐˜ด ๐˜ข๐˜ฏ๐˜ฅ ๐˜ณ๐˜ฆ๐˜ข๐˜ค๐˜ฉ ๐˜ข ๐˜ง๐˜ช๐˜ฏ๐˜ข๐˜ญ ๐˜ค๐˜ฐ๐˜ฏ๐˜ค๐˜ญ๐˜ถ๐˜ด๐˜ช๐˜ฐ๐˜ฏ.

Understanding Agents

One thing I find frustrating is the misguided content and commentary surrounding AI Agents and agentic applications โ€” what they are and what they arenโ€™t.

While itโ€™s easy to theorise and speculate about the future, such discussions are often ungrounded. The best approach is to base our understanding on recent research and to be familiar with the current technologies and frameworks available.

To truly grasp what AI Agents are, itโ€™s essential to build one yourself. The simplest way to start is by copying the code provided below and running it in a Colab notebook. As you execute each segment, your understanding will deepen. Then, try modifying details in the code, such as the model used from OpenAI, and run it again to see the effects.

Below is the complete Python code for the AI agent. The only adjustments youโ€™ll need to make are adding your OpenAI API key and LangSmith project variables.

### Install Required Packages:
pip install -qU langchain-openai langchain langchain_community langchain_experimental
pip install -U duckduckgo-search
pip install -U langchain langchain-openai
### Import Required Modules and Set Environment Variables:
import os
from uuid import uuid4
### Setup the LangSmith environment variables
unique_id = uuid4().hex[0:8]
os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_PROJECT"] = f"OpenAI_SM_1"
os.environ["LANGCHAIN_ENDPOINT"] = "https://api.smith.langchain.com"
os.environ["LANGCHAIN_API_KEY"] = "<LangSmith API Key Goes Here>"
### Import LangChain Components and OpenAI API Key
from langchain.chains import LLMMathChain
from langchain_community.utilities import DuckDuckGoSearchAPIWrapper
from langchain_core.tools import Tool
from langchain_experimental.plan_and_execute import (
PlanAndExecute,
load_agent_executor,
load_chat_planner,
)
from langchain_openai import ChatOpenAI, OpenAI
###
os.environ['OPENAI_API_KEY'] = str("<OpenAI API Key>")
llm = OpenAI(temperature=0,model_name='gpt-4o-mini')
### Set Up Search and Math Chain Tools
search = DuckDuckGoSearchAPIWrapper()
llm = OpenAI(temperature=0)
llm_math_chain = LLMMathChain.from_llm(llm=llm, verbose=True)
tools = [
Tool(
name="Search",
func=search.run,
description="useful for when you need to answer questions about current events",
),
Tool(
name="Calculator",
func=llm_math_chain.run,
description="useful for when you need to answer questions about math",
),
]
### Initialize Planner and Executor
model = ChatOpenAI(model_name='gpt-4o-mini', temperature=0)
planner = load_chat_planner(model)
executor = load_agent_executor(model, tools, verbose=True)
agent = PlanAndExecute(planner=planner, executor=executor)
### Invoke the Agent
agent.invoke(
"Who is the founder of SpaceX an what is the square root of his year of birth?"
)

Web-Navigating โ€” The Next Frontier

As agents grow in capability, they are also expanding into navigating by leveraging the image / visual capabilities of Language Models.

Firstly, language models with vision capabilities significantly enhance AI agents by incorporating an additional modality, enabling them to process and understand visual information alongside text.

Iโ€™ve often considered the most effective use cases for multi-modal models, is applying them in agent applications that require visual input is a prime example.

Secondly, recent developments such as Appleโ€™s Ferrit-UI, AppAgent v2 and the WebVoyager/LangChain implementation showcase how GUI elements can be mapped and defined using named bounding boxes, further advancing the integration of vision in agent-driven tasks.

WebPilot

The code will be publicly available at github.com/WebPilot.

In general, the initial goal was to enable agents to break down complex and ambiguous questions into smaller, manageable steps that can be solved sequentially, much like humans do.

Followed by the development of independent tools that can be integrated to enhance the agentโ€™s capabilities. Each agent is identified by a description that outlines its specific abilities and functionalities.

WebPilot aims to extend the capabilities of agents by enabling them to explore the web via a web browser.

Currently, agents are expanding in two key areas: web exploration through a browser and interpreting web pages.

The second area of focus is mobile operating systems, where agents are being developed to operate effectively.

Considering the image above, it illustrates how the WebPilot takes different steps from its decomposition process, and explores the web for answers.

To fully harness this potential, these agents must excel in tasks such as complex information retrieval, long-horizon task execution, and the integration of diverse information sources. ~ Source

Planner, Controller, Extractor

Specifically, the Global Optimisation phase is driven by Planner, Controller, Extractor & Verifier

The Planner simplifies complex tasks by breaking them into smaller, manageable steps, helping to focus on specific actions and tackle the challenges of traditional MCTS (Monte Carlo Tree Search).

Reflective Task Adjustment (RTA) then fine-tunes the plan using new observations, enabling WebPilot to adapt as needed.

The Controller monitors subtask progress, evaluating completion and generating reflections if re-execution is needed, ensuring accurate and adaptive task completion.

Throughout this process, Extractor collects essential information to aid in task execution. This coordinated approach ensures that WebPilot remains adaptable and efficient in dynamic environments.

Conclusion

Although this study is at the forefront of agentic applications, the framework itself feels somewhat opaque to me, and I donโ€™t fully grasp all the concepts.

Once the code is released and working prototypes can be built, the approach and framework should become clearer.

โœจโœจ Follow me on LinkedIn for updates โœจโœจ

Iโ€™m currently the Chief Evangelist @ Kore AI. I explore & write about all things at the intersection of AI & language; ranging from LLMs, Chatbots, Voicebots, Development Frameworks, Data-Centric latent spaces & more.

LinkedIn

--

--

Cobus Greyling
Cobus Greyling

Written by Cobus Greyling

Iโ€™m passionate about exploring the intersection of AI & language. www.cobusgreyling.com

Responses (1)