šŸ§™šŸ¼ OpenAI debuts o3 and friends

Also: Claude built my Meta ads strategy

Morning light bathes the cafĆ© in amber hues as the wizard sips thoughtfully on what can only be described as a damn fine cup of coffee. He scrolls the news headlines; this o3 model apparently ā€œwipes the floor with humansā€. ā€œWhat a peculiar claim," he jots in his notebook, pausing occasionally to consider the implications. ā€œI’ll need to run some more tests…right after my meditation breakā€.

Howdy wizards,

It’s time to settle into your preferred sitting apparatus. Baptise your neurons in a velvety shot of espresso.

Here’s what’s brewing in AI.

DARIO’S PICKS

1. OpenAI drops the full o3 model and o4-mini (plus a couple of treats)

o3 has amazing visual reasoning now. It can ā€œthink with imagesā€ and edit them as part of its reasoning process. Image source: OpenAI for Business via LinkedIn

OpenAI launched o3 and o4-mini this week, two powerful state-of-the-art reasoning models whose naming closely follows the company's signature blend of chaos and confusion.

Here’s the details:

  • o3 is the new top-tier reasoning model. It pushes benchmarks across almost all disciplines: coding, math, science, visual perception, and more. Early testers also saw 20% less errors compared to the o1 model on areas like programming, business/consulting, and creative ideation.

    • Sam Altman quotes a well-known immunologist saying the new model is ā€œat or near genius levelā€, a statement that seems supported if we look at traditional IQ tests (but keep in mind that benchmarks are gameable—not even PokĆ©mon is safe—and not necessarily good indicators of actual usefulness):

    • Something to note if you’re building apps with AI. At similar performance levels, o3 is 4x more expensive to use than Gemini 2.5 Pro. So depending on your budget you might want to look into the latter.

  • The second model they launched is o4-mini—a small, cheap and fast version of OpenAI’s upcoming flagship reasoning model which will presumably be called o4 (although, with OpenAI’s naming game, we don’t really know its name until it’s released). It outperforms the previous o3-mini in evaluations and has higher usage limits—a good choice for apps that need smart responses at massive scale.

  • The new models have access to all tools within ChatGPT and, importantly, are trained to know when and how to use them, including improved output formats (hello, nice table instead of long messy list). The tools we’re talking about here are web search, analysing uploaded files and data analysis with Python.

  • Ability to ā€œthink with imagesā€: both of the new models can blend visual and textual reasoning in a new way. They no longer simply ā€œseeā€ the image you upload, but use it directly in their chain of thought. They can also edit the image as part of their reasoning, zooming, cropping, transforming. Combining its new tools and visual perception, I reckon o3 might now be the world’s best geoguesser.

  • Thinking longer = better results. Along with the launch, OpenAI also reported that letting models think for longer still yields performance gains. The "throw more compute at it" approach that worked well for pretraining is also working for reinforcement learning.

  • Alongside the launch of these two new models, OpenAI also shipped two other things this week:

    • A new model, GPT 4.1 in the API. This model has a massive 1M context window. While they’ve released three versions of it—nano, mini and full—apparently the mini-model is the star of the show (outperforms GPT-4o at 80% less cost).

    • Codex CLI: A coding agent available through the terminal that supports o3 and o4-mini. It’s an experimental, minimal interface for connecting OpenAI’s models to your computer.

ā€Ž Why it mattersā€Ž ā€Ž I’ve only had the chance to test the o3 model for a few days—my first impression is that it’s highly useful. Definitely my newest thought partner, sharing the podium with Claude. As someone who constantly drags-and-drops screenshots into ChatGPT while working, I also can’t wait to test it on more visual tasks.

Quick battle report: I tested it back to back with Claude on a classic data nightmare: matching records across two CSV files with inconsistent naming conventions. Great task for AI. Anyways—while Claude delivered a partially correct solution with some mistaken records in the results, o3 pulled it off without error. A bit more woosah to my workflow.

IN PARTNERSHIP WITH DATABUTTON

Want to build business software & apps but aren’t a dev?

Built on top of the world’s first reasoning AI-developer, Databutton will turn your new app idea into reality. 

Here’s why you should give Databutton a go:

  • It breaks everything down into a series of tasks and works with you through the entire build process

  • The agent isn’t a command taker, it actively collaborates with you to make sure you land on the right solution (kinda like your own personal CTO)

  • It manages the full-stack, including frontend, backend, and deployment

Or as one user said, ā€œDatabutton is like being in possession of a nuclear weapon equivalent of unlimited creative potential.ā€ šŸ”„

DARIO’S PICKS

Image source: Anthropic

Anthropic launched Research in early beta, a new feature that allows Claude to do systematic web searches web searches, exploring different angles of your question automatically, while answering with citations. It’s only available on the Max plan and upwards ($100+/month) for users in the US, Japan, and Brazil.

Claude also has a brand new Google Workspace integration hat connects Claude to your email, calendar, and Google Drive documents. It can now do things like searching your emails, using your documents and see what’s in your calendar. It also provides inline citations so you can check the sources. This eliminates the need to paste information back and forth between these platforms and Claude, which saves time and lets you focus on more important stuff. In Anthropics’ own words it can ā€œpull together meeting notes from last week, identify action items from follow-up email threads, and search relevant documents for additional contextā€.

ā€Ž Why it mattersā€Ž ā€Ž

  • Of the major AI players, Anthropic is a latecomer in adding this type of deep research feature to their AI; They call it just ā€œResearchā€, but it’s a similar feature to what ChatGPT, Gemini, Perplexity and others have added over the last months. It can help you go deep on a topic with minimal work—but before firing your researcher you should consider that existing ā€œdeep researchā€ features have several interesting, non-obvious problems.

  • AI + a knowledge base is a powerful combination, and probably the most popular employee agent that businesses build with AI. This feature is cool but also isn’t really something new to the market, Gemini already integrates tightly with Google Workspace, and a Google Drive integration also exists for ChatGPT through connected apps.

UP CLOSE

How Claude helped me build a testing framework for running Meta ads for this newsletter

In this mini-series I break down different ways I use AI from week to week. Previously I’ve covered how I use AI on my phone, how it helps me run this business by myself (part 1 and part 2), and using ChatGPT’s new image generator.

This newsletter is nearly at 14,000 subscribers now — a wizard gathering that has amassed organically, helped by your fine referral work and word-of-mouth. I've decided to up things a notch by implementing the same growth strategies as all the major newsletters: paid advertising. And I'm currently learning diving deep into running and optimising ads on Meta.

Setting up ad campaigns is an area I have some, albeit quite limited, experience in. I chose Claude to help me with this as it’s not only amazing at creative writing, but also great at creating structured plans and tables—and I knew this would require me to move back and forth between the two as a I refine the output.

I started asked mr 3.7 Sonnet some general questions about how to craft an effective ads strategy for gaining subscribers via Meta ads (I also watched some YT videos on the topic beforehand). After learning the basics on how the ad manager platform works, Claude proposed a detailed experimental setup so I can test and optimise different campaigns, ad formats, messaging… something that will allow me to control different components to tweak and combine them for best results (getting the highest-quality subscribers at the lowest cost).

Talk systematic to me

Claude helped me brainstorm creative ideas and coming up with different ad concepts for testing in my campaign. It then detailed the full campaign structure, complete with ad sets and specific ad variations, setup in a way that will allow me to get a clear overview of what’s working and what’s not—and optimise towards my goal.

My work-in-progress in Figma with different Meta ad variations. Claude helped on the specific messaging for each one, and suggested a structure with consistent naming.

Finally, Claude helped me navigate the process of setting up granular tracking (Google Tag Manager and Meta Pixel) in beehiiv (my email software) so that I can evaluate exactly which subscribers came from which ad variations.

I should add that I wouldn’t have done this without first watching some solid YouTube videos on the topic of driving subscribers through Meta Ads before doing this; however, AI came into play and was incredibly helpful when adapting these learnings to my specific scenario. I saved myself hours of setup time. More importantly, because I've implemented a proper experimental structure from the beginning, I'll be able to optimise quickly—identifying winning combinations and cut the losers—without getting lost in a maze of poorly organized campaigns.

Are these practical AI stories useful to you?

My "Up Close" series (like today's peek into my Meta ads campaign building with AI):

Login or Subscribe to participate in polls.

CONTEXT WINDOWS

ā€œis this AI thing actually working for anyone?ā€ (Spoiler: 771 times yes)

contextwindows.ai just hit 771 case studies with fresh additions from companies like Canva, Notion, Deutsche Bank, and Capgemini this week.

Find out how companies actually implement AI instead of obsessing over which model beats which on paper.

FLASH SALE: 50% OFF. This week only —> use code WIZARD-HEIST at checkout

THAT’S ALL FOLKS!

Was this email forwarded to you? Sign up here.

Want to get in front of 14,000 AI enthusiasts? Work with me.

This newsletter is written & curated by Dario Chincha.

Affiliate disclosure: To cover the cost of my email software and the time I spend writing this newsletter, I sometimes link to products and other newsletters. Please assume these are affiliate links. If you choose to subscribe to a newsletter or buy a product through any of my links then THANK YOU – it will make it possible for me to continue to do this.