LangChain Web Browsing AI Agent
How to perform browser automation by making use of PlayWrite and LangChain AI Agents
I have said this numerous times in the past, AI Agents need to live within our digital and physical environments. The digital environment is an easier problem to solve than the physical environment.
The AI Agent needs to be able to navigate our digital environments like the web and computer operating systems.
I’m particularly interested in how AI Agents can browse the web to find real-time answers from the web.
Another element I find particularly interesting is the ability to run an AI Agent within a notebook or locally on my machine.
This article covers a neat project where a LangChain AI Agent is combined with PlayWright.
PlayWright
The technology outlined on the Playwright integration page for LangChain shows a powerful advancement in automating and interacting with web-based content through Python.
Playwright is a browser automation library, combining the library with LangChain enable developers to programmatically navigate, scrape and manipulate dynamic web pages.
This enhances data retrieval capabilities for AI-driven applications.
This combination enables more sophisticated workflows, such as extracting real-time information from websites to feed into language models for analysis or content generation.
I guess ultimately this technology bridges the gap between AI agents and dynamic web interaction for context-aware AI systems.
The image below shows a basic breakdown of the architecture, in the code example the makes in of Anthropic for the LLM backbone.
The image below shows the sequence of events between the different components.
Here is a simple example, where I asked the LangChain AI Agent the following question, What is the current ZAR USD exchange rate?
result = await agent_chain.arun("What is the current ZAR USD exchange rate?")
print(result)
Notice in the image below how the LangChain AI Agent goes through a sequence of Observation, Thought, Action, Observation, … Until Action named Final Answer is reached, the chain is completed and the final answer is given.
LangChain does a good job in highlighting how this technology can be leveraged within an agent-based framework, which is a key aspect of building intelligent, autonomous systems.
In this context, an AI Agent refers to an AI entity — typically powered by a language model — that can perform tasks, make decisions, and interact with its environment based on user instructions or predefined goals.
By using Playwright as an agent-tool within an AI Agent, the system gains the ability to actively engage with web browsers, allowing it to execute complex, multi-step operations on websites.
For example, an AI Agent could be tasked with researching a topic online, navigating through search results, visiting relevant pages, and extracting up-to-date information without manual intervention.
PlayWright Browser Toolkit
The Python code below can be copied “as is” and pasted into a notebook and run. The only change or update to the code is adding your Anthropic API key.
%pip install --upgrade --quiet playwright > /dev/null
%pip install --upgrade --quiet lxml
###
# If this is your first time using playwright, you'll have to install a browser executable.
# Running `playwright install` by default installs a chromium browser executable.
# playwright install
from langchain_community.agent_toolkits import PlayWrightBrowserToolkit
###
from langchain_community.tools.playwright.utils import (
create_async_playwright_browser, # A synchronous browser is available, though it isn't compatible with jupyter.\n", },
)
###
# This import is required only for jupyter notebooks, since they have their own eventloop
###
import nest_asyncio
###
nest_asyncio.apply()
###
!playwright install
###
!pip install langchain_anthropic
###
async_browser = create_async_playwright_browser()
toolkit = PlayWrightBrowserToolkit.from_browser(async_browser=async_browser)
tools = toolkit.get_tools()
tools
###
tools_by_name = {tool.name: tool for tool in tools}
navigate_tool = tools_by_name["navigate_browser"]
get_elements_tool = tools_by_name["get_elements"]
###
await navigate_tool.arun(
{"url": "https://web.archive.org/web/20230428133211/https://cnn.com/world"}
)
###
# The browser is shared across tools, so the agent can interact in a stateful manner
await get_elements_tool.arun(
{"selector": ".container__headline", "attributes": ["innerText"]}
)
###
# If the agent wants to remember the current webpage, it can use the `current_webpage` tool
await tools_by_name["current_webpage"].arun({})
Use within an Agent
Again, the Python code below can be copied “as is” and pasted into a notebook and run. The only change or update to the code is adding your Anthropic API key.
import os
# Set the API key directly in the code
os.environ["ANTHROPIC_API_KEY"] = "<>" # Replace with your actual API key
from langchain.agents import AgentType, initialize_agent
from langchain_anthropic import ChatAnthropic
# Get API key from environment variable, raise error if not found
anthropic_api_key = os.environ.get("ANTHROPIC_API_KEY")
if anthropic_api_key is None:
raise ValueError(
"ANTHROPIC_API_KEY environment variable not set. "
"Please set it with your Anthropic API key."
)
# Initialize ChatAnthropic with the API key
llm = ChatAnthropic(
model_name="claude-3-haiku-20240307",
temperature=0,
anthropic_api_key=anthropic_api_key # Pass the API key here
)
agent_chain = initialize_agent(
tools,
llm,
agent=AgentType.STRUCTURED_CHAT_ZERO_SHOT_REACT_DESCRIPTION,
verbose=True,
)
###
result = await agent_chain.arun("What are the headers on langchain.com?")
print(result)
###
This setup allows the AI Agent to combine web-derived data with LangChain’s natural language processing capabilities for a seamless pipeline where raw web information is retrieved, processed and transformed into insights or actions.
For instance, a business intelligence AI Agents could monitor competitor websites, extract pricing data, and generate a comparative report, autonomously.
Chief Evangelist @ Kore.ai | I’m passionate about exploring the intersection of AI and language. From Language Models, AI Agents to Agentic Applications, Development Frameworks & Data-Centric Productivity Tools, I share insights and ideas on how these technologies are shaping the future.