Sitemap

Web Browsing AI Agent with A Web-UI: A Seamless Browser Interface

Web-UI is an open-source project, an AI Agent which can be installed and run locally on your machine.

5 min readMay 20, 2025

--

What I love about this project is that it is open-sourced, you can run it locally on your machine and connect to one or more Language Model in the cloud.

Previously I wrote a piece on the Browser Use project, which was really a command line tool. This project has a web UI with quite a bit of features.

The ability for AI Agents to interact with web environments will become increasingly important.

Research has shown that currently web using AI Agents reaches a higher level of accuracy than computer using AI Agents.

Web-UI is an open-source project designed to facilitate the interaction between an AI Agent and a web browser.

It also needs to be stated that web browsing AI Agents are currently very susceptible to attacks. Hence human supervision is still vital while AI Agents executes.

More on Web-UI

Web-UI is an open-source project that enhances the capabilities of AI Agents by providing a user-friendly browser interface.

As I have mentioned, building upon the foundation of browser-use, Web-UI offers a streamlined approach to integrating AI agents with web browsers.

Considering the image below, there is a textbox (1) where the user can enter their task in text. Subsequently the task can be submitted (2) and the steps are shown within the web-based UI.

The browser screen together with a JSON document, containing:

  1. The current state, containing,
  2. the evaluation of the previous goal,
  3. access to memory,
  4. what the next goal is,
  5. and the related action.

This integration allows for tasks such as automated web navigation, data extraction, and interaction with web applications.

Key Features

User-Friendly Interface built on Gradio, Web-UI has an intuitive interface that enables you to interact with AI agents without extensive technical knowledge

Expanded LLM Support is available. Web-UI integrates support for various Large Language Models (LLMs), including Google, OpenAI, Azure OpenAI, Anthropic, DeepSeek, and Ollama, with plans to incorporate more in the future.

Users can utilise their own browsers with Web-UI, eliminating the need for repeated logins or authentication challenges. This feature also supports high-definition screen recording

With persistent browser sessions, Web-UI allows users to maintain browser sessions between AI tasks, providing visibility into the history and state of AI interactions.

And, the latest release introduces support for MCP servers, enabling AI agents to interact with external tools and services beyond the browser, such as running desktop commands or connecting to databases.

The Web-UI UI

Considering the video below, the question or input is shown, What is the weather in Cape Town?

The different steps are shown, with the screen grabs and the corresponding JSON logs which shows the reasoning of the AI Agent.

At the end you see the task completion, with the duration, tokens, status and final result.

The video below shows how the browser screen is mapped by the Web AI Agent, with bounding boxes and numbers…

OpenAI Console

For my prototype I made use of OpenAI for the Language Model support…

Below is the OpenAI log console, you can see I made use of the gpt-4o-2024–08–06 model.

And you can see the prompt sent from the AI Agent to the Language Model:

Your task is to extract the content of the page.

You will be given a page and a goal and you should extract all relevant information around this goal from the page.

If the goal is vague, summarize the page.

Respond in json format.

Extraction goal: extract the current weather information for Cape Town

Getting Started In 7 Steps

Getting started is real easy, I performed these steps on a MacBook via the Terminal application…

Create a new directory for your AI Agent to live in…

mkdir web_ui

Go into that directory…

cd web_ui

Create a virtual environment for the installation called web_ui.

python3 -m venv web_ui

Activate the virtual environment…

source web_ui/bin/activate

Clone the GitHub repository…

git clone https://github.com/browser-use/web-ui.git

Install all the requirements…

pip install -r requirements.txt

And finally run the application:

python webui.py — ip 127.0.0.1 — port 7788

You will see the message:

* Running on local URL: http://127.0.0.1:7788

Which means you can click on the link and your web-based AI Agent UI will open-up.

Conclusion

AI Agent debates are endless, but theory only goes so far. To truly grasp what’s happening in the market, prototype and build with user-friendly frameworks like this one.

Another great option? Dive into notebooks. LlamaIndex and LangChain offer excellent ones you can run in your browser. Tinker with them, and you’ll gain a deeper understanding in no time.

Chief Evangelist @ Kore.ai | I’m passionate about exploring the intersection of AI and language. From Language Models, AI Agents to Agentic Applications, Development Frameworks & Data-Centric Productivity Tools, I share insights and ideas on how these technologies are shaping the future.

--

--

Cobus Greyling
Cobus Greyling

Written by Cobus Greyling

I’m passionate about exploring the intersection of AI & language. www.cobusgreyling.com

Responses (4)