OpenAI unveils “Operator”, automated agent that can perform tasks on its own

OpenAI has announced Operator, a new AI agent that can browse through the web and perform tasks on its own. Operator isn't exactly just a chatbot, but it can actually navigate websites, click buttons, fill out forms, and complete tasks on its own.

When you give Operator a task, it breaks down the task into smaller steps. One of the examples OpenAI showed in its livestream was when Operator was asked to order groceries from Instacart upon giving a picture of a handwritten shopping list. Operator started a browser instance on the cloud, and was able to open Instacart's website, search for individual items and add them to the cart, and even go through checkout (although it will still need confirmation from the user at various steps, before performing any irreversible action on the website).

Operator uses the Computer-Using Agent (CUA) model that combines the vision capabilities of GPT 4o with advanced reasoning thorugh reinforcement learning. The model is specifically designed to interact with graphical user interfaces (GUI) that essentially allows the AI to "see" web pages through screenshots and interact with them using mouse and keybaord actions.

The model can even self-correct when encountering challenges and is trained to hand control back to the user when needed, to make sure it doesn't go rogue and start doing things automatically. This is especially useful with sensitive tasks like dealing with payments or other senstive information.

As of now, Operator is a research preview and is only available to ChatGPT Pro users in the United States, although OpenAI has promised that it would be coming to other regions in the coming months. Users in the European Union region might have to wait a bit more due to stricter compliance requirements.

You can read OpenAI's full announcement of Operator by clicking here.