OpenAI introduces ChatGPT agent that can complete tasks using its own computer

OpenAI already offers two distinct types of agents: Operator, which can browse the web and independently carry out tasks, and Deep Research, which specializes in synthesizing large volumes of online information. Today, OpenAI unveiled the ChatGPT agent, a new AI that combines the web-browsing abilities of Operator, the research strengths of Deep Research, and the conversational skills of ChatGPT into a single, powerful agent.

The ChatGPT agent can now do work using its own computer. Based on the user query, it can navigate websites, filter results, prompt a user to log in when required, run code, do analysis, create spreadsheets and PowerPoints, and more.

The ChatGPT agent will have access to the following tools to complete the tasks given by users:

A visual web browser that interacts with the web through a GUI
A text-based browser for simpler reasoning-based web queries
A terminal
Direct API access
The ability to connect with ChatGPT connectors.

Since the ChatGPT agent is doing all its work using its own virtual computer, it will have all the required context to complete the task. For example, the agent can visit a website using the browser, download a file from the website, manipulate the same file by running a command in the terminal, and then view the output back in the visual browser.

OpenAI claims that the ChatGPT agent posts state-of-the-art performance on various evaluations measuring web browsing and real-world task completion capabilities. Here are some of the highlights:

Humanity’s Last Exam: The ChatGPT agent scores a new pass@1 SOTA at 41.6. When running up to eight attempts at once and picking the one with the highest self-reported confidence, the score increases to 44.4.
FrontierMath: The ChatGPT agent reaches 27.4% accuracy.
OpenAI"s internal benchmark, which evaluates model performance on complex, economically valuable knowledge-work tasks: The ChatGPT agent"s output is comparable to or better than that of humans in roughly half the cases.
DSBench⁠: The ChatGPT agent surpasses human performance by a significant margin on data science tasks.
SpreadsheetBench: The ChatGPT agent scores 45.5%, compared to Copilot in Excel’s 20.0%.
BrowseComp⁠: The ChatGPT agent set a new SOTA with 68.9%.
WebArena: The ChatGPT agent scored 65.4%.

The ChatGPT agent is now available in the ChatGPT tools dropdown with the new ‘agent mode’. When the agent is performing its task, users can find on-screen narration; they can also interrupt and take control of the browser whenever needed.

The ChatGPT agent will be available for all ChatGPT Pro users by the end of the day. ChatGPT Plus and Team users will get access over the next few days, while Enterprise and Education users will get access in the coming weeks. ChatGPT Pro users can have 400 messages per month with the agent, while other paid users will only get 40 messages monthly. However, users can purchase additional agent usage using flexible credit-based options.

Tags