- What's brewing in AI
- Posts
- 🧙🏼 Breaking: Claude can *use computers*
🧙🏼 Breaking: Claude can *use computers*
...and takes the coding throne
Howdy, wizards.
Last week I talked about the indications that Anthropic had big things in the works—they sure did.
Today it’s all about Claude’s new skills.
DARIO’S PICKS
Anthropic might’ve just made breakthrough progress towards agentic AI. They’ve released an early version of “computer use” which allows an AI to, well, use a computer.
Claude can now do things like taking screenshots, moving your cursor around the screen, clicking and typing text – emulating how humans interact with computers. Computer use is now in public Beta, available to developers via API.
Demos:
Computer use for automating operations – Claude filling out a form in a browser window, by using data from a spreadsheet and a CRM database that’s open in two separate browser windows.
Computer use for coding – Claude going online to use claude.ai (inception!) to create a website, downloading the output, importing it to a code editor locally on the user’s computer and setting up a server for it. The model also self-corrects and retries tasks when it gets it wrong.
Computer use for orchestrating tasks – Claude planning and scheduling a sunrise hike, using Google, Maps and Calendar on the user’s computer.
Despite the boosted capabilities of Claude, with computer use and the upgraded Claude 3.5 models (more on that below), Anthropic’s AI Safety Level remains at Level 2. They justify introducing computer use now by arguing that it’s better to implement it in existing models rather than wait until more advanced models pose a greater risk.
PS Anthropic also published an interesting read on how they developed this feature. Apparently, one of the critical aspect was getting the model to count the pixels on your screen correctly–so that it understands the right place to click. The training consisted of giving Claude simple tools, like a text editor and calculator, from which it learned generalisable skills on how to operate computers.
Why it matters We're gradually moving from managing AI to do individual steps of a process, to delegating whole tasks. Anthropic might’ve just played their ace in the horse race between the leading AI model providers: as the first model to offer computer use, they’ll attract builders whose idea is only possible with this feature, and they’ll gather important early feedback that’ll help them improve it. However, Anthropic acknowledges the new feature is currently error-prone, and has trouble with seemingly basic tasks like scrolling, dragging and zooming. It’s also neither fast nor cheap, so while computer use opens up a world of new use cases, it might not be viable in all production settings just yet.
TOGETHER WITH PRESSMASTER.AI
Better PR with minimal effort: Let AI write your articles
Generate high-quality articles in seconds - SEO-optimized, plagiarism & fact-checked
Be featured for free by journalists looking for credible sources
Get your articles indexed and ranked directly in Google News
Distribute your content to top magazines with a single click
Forget ChatGPT: Manage, publish and track your PR efforts in one place
Get great PR fast:
DARIO’S PICKS
In addition to computer use – which I felt deserved to be explained separately – Anthropic:
Upgraded Claude 3.5 Sonnet, improved across the board from its predecessor, especially when it comes to coding.
Launched Claude 3.5 Haiku, a new cheap and fast model that matches the performance of Claude 3 Opus, Anthropic’s prior largest model.
The upgraded Claude 3.5 Sonnet model shows improvement on the coding benchmark SWE-bench 33.4% to 49.0% – higher than all publicly available models, including OpenAI’s o1-preview.
It also shows significantly improved scores on different domains of TAU-bench, a benchmark for agentic tool use.
Early testers are already exploring the new model and computer use in combination for agentic applications: Replit is using it to develop a feature that evaluates apps as they’re being built with Replit Agent, The Browser Company is using the model for automating web-based workflows, GitLab is testing the model for DevSecOps tasks and Cognition uses the model for autonomous AI evaluations.
Why it matters Anthropic is delivering a one-two punch by launching a state-of-the-art coding model—coding being one of the key revenue drivers for ChatGPT—together with computer use. It’ll be interesting to see, and show you, what applications people and companies find for these new tools.
FROM OUR PARTNERS
200+ hours of research on AI tools, prompting techniques & hacks packed in a solid 3 hour masterclass.
THAT’S ALL FOLKS!
Was this email forwarded to you? Sign up here. Want to get in front of 13,000 AI enthusiasts? Work with me. This newsletter is written & curated by Dario Chincha. |
What's your verdict on today's email? |
Affiliate disclosure: To cover the cost of my email software and the time I spend writing this newsletter, I sometimes link to products and other newsletters. Please assume these are affiliate links. If you choose to subscribe to a newsletter or buy a product through any of my links then THANK YOU – it will make it possible for me to continue to do this.