The makers of an AI model claim it can take control of a computer and complete tasks such as completing forms. The latest version of "Claude" takes a screenshot and counts pixels to figure out how far to move the cursor.

Anthropic, which made Claude, say this is the first time a publicly released AI model has the capability of "computer use". It defines this as "looking at a screen, moving a cursor, clicking buttons, and typing text.

The goal is to allow the model to carry out tasks which go beyond simply generating text or images in line with a user's instructions. Instead, it could actually use this text, for example to send an email or fill out an online form to book a trip.

While inputting text or even simulating a mouse movement isn't a particularly difficult task to automate on a computer, figuring out where to move and click the cursor on the screen is trickier. The feature works by taking a screenshot, identifying the necessary location, then counting the number of pixels to "move" to that location.

Drag-And-Drop Off The Table

In its current form, the tool can only work with a rapid series of screenshots rather than video of the screen. That means it struggles to react to pop-up notifications or to replicate a "drag-and-drop" operation that a human could do. (Source: arstechnica.com)

For now, ordinary users can't simply run Claude and access this feature. Instead, it's only available to third-party developers who create applications using the model. They'll be able to translate user instructions into computer commands.

Risk Reduction Request

Anthropic gives the example of a user typing "use data from my computer and online to fill out this form" and the AI tool carrying out the sequence of tasks: "check a spreadsheet; move the cursor to open a web browser; navigate to the relevant web pages; fill out a form with the data from those pages." (Source: anthropic.com)

Anthropic is also clear that the computer use element is very much in beta stage. It admits the feature is "is still experimental-at times cumbersome and error-prone. We're releasing computer use early for feedback from developers, and expect the capability to improve rapidly over time."

