Researchers from Microsoft Research Asia have developed a new ready-to-use component for computer-use AI agents called UI-Evol that helps to make them more accurate and reliable. Computer-use AI agents are AI models that have access to an operating system to perform tasks autonomously, but research shows they’re not very accurate.
These AI agents often find information from the internet to figure out how to navigate interfaces. With UIs changing all the time, you can probably guess that these models fail to translate this internet knowledge into a successful UI interaction. This is a problem called the knowledge-action gap.
A study highlighted by Microsoft found that even with 90% correct instructions, agents performed successfully only 41% of the time. Additionally these agents are unpredictable and perform the same task differently each time. Obviously, this needed to be addressed.
Enter Microsoft Research Asia with UI-Evol, a ready-to-use component that integrates into an agent’s workflow and relies on the actual user interface for guidance. The purpose of UI-Evol is to continuously update interface knowledge, making agents more accurate and reliable.
UI-Evol works using a simple two-stage process. Firstly, it uses a method called Retrace where it records the exact steps (clicks, keystrokes, and actions) an agent takes to successfully complete a task. Then it uses a method called Critique where it reviews the recorded actions against external instructions. If mismatches are found, it adjusts the knowledge to reflect what actually works in the software environment, creating reliable, tested guidance.
To assess its effectiveness, UI-Evol was tested on Agent S2, one of the best computer-use agents, using the OSWorld benchmark. Experiments with agents based on leading LLMs like GPT-4o and OpenAI-o3 showed two key improvements: higher success rates and greater consistency resulting in reduced behavioural standard deviation, making the agents more reliable.
With this work, Microsoft could have made agents a lot better in their office automation and virtual assistant roles.
Image via Depositphotos.com