How To Build An OpenAI Computer-Using Agent (CUA Model)

Build a computer-using agent that can perform tasks on your behalf. Use the CUA model from OpenAI to create an AI Agent which runs locally on your Machine.

5 min readMar 31, 2025

--

I, like so many others, saw the computer the demo from OpenAI showing their Operator implementation. What excited me was the fact that Operator makes use of a CUA model, Computer-Using Agent (CUA) model.

This is such a good example of how the multi-modal capabilities of models are expanding with vision and being able to interpret GUIs within a browser.

Currently I do not have access to Operator, but wanted to build a demonstration application based on the CUA model, where I ask a simple question, and the AI Agent opens a browser to find the answer.

The computer use tool and model can be accessed through the Responses API.

In essence, the CUA model examines a screenshot of the computer interface and suggests actions to take.

More precisely, it issues computer_call(s) with instructions such as click(x,y) or type(text), which you must then carry out in your environment, followed by providing screenshots of the results.

In the video below, I asked the AI Agent to get the weather in Cape Town, Dar Es Salaam and also check the Apple stock price…

If you like this article & want to show some love ❤️

- Clap 50 times, each one helps more than you think! 👏

- Follow me on Medium and subscribe for free. 🫶

- Find me on LinkedIn or on X! 🙂

Considering the image above, here’s how to add the computer use tool to your app in simple steps:

A user sends a request (1) to the model, (2) include the computer tool in the list of tools, along with the display size and environment details. You can attach a screenshot of the starting state with the first request.

Get the model’s response, (3) look for any computer_call items in the reply. These suggest actions like (4) clicking, typing, scrolling, or waiting to move toward your goal.

Perform the action, (5) use code to carry out the suggested action on your computer or browser (6).

Take a new screenshot, (7) after the action, capture the updated environment as a screenshot.

Repeat: Send a new request with the updated screenshot as (7 back to 1) computer_call_output, and keep going until the model stops suggesting actions or you choose to stop.

Again, on a MacBook, you can make use of the terminal application to perform all tasks…

From the terminal command line, create a virtual environment…I called the virtual environment cua.

python3 -m venv cua

Then activate the virtual environment…

source cua/bin/activate

You will see the command line prompt changes to show you, that you are within the virtual environment now.

Clone the OpenAI demonstration project from GitHub…

git clone https://github.com/openai/openai-cua-sample-ap

Once you have entered the command, you will be prompted for your GitHub username, and then password.

For the password, you need to enter an access token found in GitHub in your user settings. Read more about it here:

You will see a new folder is created, as shown below, with the files and file structure.

Run the command below to perform the installation of all the requirements…

pip install -r requirements.txt

Create an environmental variable for your OpenAI API key…

export OPENAI_API_KEY=<your secret key>

And lastly run the AI Agent with the command below…

python3 cli.py --computer local-playwright

You will see the prompt changes, and a browser opens up, now you can talk with the AI Agent via the command line. There is no need for browser interaction….

Below, you can see I ask the AI Agent a question regarding the weather….

And below you can see how the browsing Agent is interacting with the browser…

The responses from the computer-use-preview-2025–03–11 model is visible below, in the OpenAI dashboard.

If you click on one of the lines, the images is shown with the response from the model…

The Agent class can use regular function schemas as tools, returning a fixed value when called. If you include tools that match your Computer methods (along with the required ones), they’ll be sent to your Computer to handle.

This helps in situations where screenshots miss things like the search bar or back arrow, which can confuse the CUA model.

Chief Evangelist @ Kore.ai | I’m passionate about exploring the intersection of AI and language. From Language Models, AI Agents to Agentic Applications, Development Frameworks & Data-Centric Productivity Tools, I share insights and ideas on how these technologies are shaping the future.

https://openai.com/index/new-tools-for-building-agents/

--

--

Cobus Greyling
Cobus Greyling

Written by Cobus Greyling

I’m passionate about exploring the intersection of AI & language. www.cobusgreyling.com

Responses (1)