- What's brewing in AI
- Posts
- OpenAI's o3 is the closest thing to AGI yet
OpenAI's o3 is the closest thing to AGI yet
Also: The Sunday recap✨
Howdy, wizards.
⏪ This is your weekly recap email on Sundays, with all the best links I’ve shared during the week.
But before we jump in, I have two announcements:
1) This is the last issue of What’s brewing in AI this year.
I want to say a big, massive THANK YOU to all of you who read my emails, check out my sponsors, send me feedback, etc. I’m so enjoying keeping you up to speed in this fast-moving world of tech, which I have no doubts will continue to accelerate at a crazy speed in 2025 — and I’ll be here to tell you all about it.
I’ll be back in your inbox promptly, in January ✨
2) Before jumping into today’s weekly recap, there’s a major announcement from OpenAI.
Today, we shared evals for an early version of the next model in our o-model reasoning series: OpenAI o3
— OpenAI (@OpenAI)
7:16 PM • Dec 20, 2024
On the 12th and last day of their shipmas, OpenAI announced o3, the next update of their reasoning model o1. They didn’t call it o2—which would’ve been all the more logical—due to a copyright issue.
Here’s the deets:
There’s 2 models being released, o3 and o3-mini (optimised for coding), with the first version expected to become available early next year. The models are currently ongoing public safety testing for researchers – for which you can apply.
o3 is way smarter and capable that its predecessor, o1. The benchmarks shows it’s better at math, coding and scientific reasoning than anything seen so far:
o3 can solving difficult coding problems at the same or higher level than the best developers on the planet. It scores 71.7 on the SWE-bench Verified benchmark (which consists of real-world software tasks); more than 20% better than o1 and Claude 3.5 Sonnet. In the announcement video, Sam Altman mentions there’s just like 1 guy left at OpenAI who still scores better than o3 at coding, “so he still has a few more months enjoy”.
o3 is also crushing it when it comes to benchmarks for maths and PhD-science level questions, including over 25% score on EpochAI’s Frontier Math benchmark – the toughest maths benchmark out there and takes pro mathematicians hours or days to solve each problem. Other current models score less than 2% on this benchmark.
o3 scores >75% on the ARC-AGI benchmark, which measures how good models are at learning difficult tasks on the fly. Over the last 5 years, leading frontier models have scored no more than 5% on this benchmark.
Price-wise, o3 is likely too expensive for most use cases, but could be worthwhile for hard problems in fields like academia, finance and industrial applications where the benefits of successful answers would outweigh even very costly models.
Why it matters While not AGI, o3's performance—particularly in coding—represents a major leap forward. Software can solve so many problems, but is heavily constrained by development costs and complexity. Excited to see how well it performs in real-world use cases, and how it will change the way we use AI.
TOGETHER WITH 1440 MEDIA
The team at 1440 scours over 100+ sources ranging from culture and science to sports and politics to create one email that gets you all caught up on the day’s events in 5 minutes. According to Gallup, 51% of Americans can’t think of a news source that reports the news objectively. It’s 100% free. It has everything you need to be aware of for the day. And most importantly, it simplifies your life.
THE SUNDAY RECAP
THE MOST IMPORTANT DEVELOPMENTS IN AI THIS WEEK
Introducing Projects—an easy way to organize chats that share topics or context in 4o.
Now available for ChatGPT Plus, Pro, and Team users globally.
We’ll bring it to Enterprise and Edu users in January, and to Free users soon.
— OpenAI (@OpenAI)
8:59 PM • Dec 13, 2024
OpenAI’s Day 7 of shipmas continued on Friday with the launch of a much-awaited feature in ChatGPT: Projects, fancy folders to organize your chats.
A project holds chats, uploaded files and specific custom instructions together in one place. This makes it a whole lot easier to find and continue where you left off for ongoing work.
Projects have support for web search and Canvas.
Powered by GPT-4o, ie you can’t use o1 with projects (at least not yet).
Only available on the web version of ChatGPT and on the Windows desktop app.
Demos:
Putting documentation about stuff around your home (fridge manual, garage instruction, smart home notes, maintenance log, etc) in one project so you can query it anytime 🔥
Creating and iterating on a personal website. The project holds code documentation about a specific Javascript framework that the site is built on, and has uploaded details about the author.
I’ve seen this feature requested so many times on social media and it’s finally here. The search for those previous chats where we had that perfect prompt or had given ChatGPT so much good context is finally over.
Also don’t miss the first demo I linked to here – it can seem deceptively trivial, but creating these “mini-systems” for yourself using ChatGPT/Claude/what-have-you is, in my opinion, the best way to learn to be effective with AI.
→ Read the full newsletter here
ChatGPT now has support for coding apps including Warp, IntelliJ IDEA, PyCharm, and more. Once enabled, ChatGPT will be able to see what you’re working on inside those apps while you’re working. On the ChatGPT Desktop app, these apps are enabled through the Work with Apps option located underneath the message input bar. Coding with apps in this way is now also compatible with the o1 and o1 Pro models.
When it comes to writing, ChatGPT now has support for Apple Notes, Notion, and Quip. Instead of having to copy/paste context from these apps, you can now connect ChatGPT to them directly, which gives it context about the entirety of your document.
Work with Apps also works with Advanced Voice Mode. This means that you can, for example, have a real-time convo with ChatGPT’s voice model while you’re editing a Notion document, while getting live feedback on what you’re doing.
The Work with Apps feature is currently available to all paid users, though only works on the macOS desktop app. OpenAI is planning to bring it to Windows and to free users next year.
ChatGPT is becoming a better coding environment with the integration of popular coding apps – something OpenAI is rushing to enable as they’re facing fierce competition from apps like Cursor AI, which has really nailed the AI coding workflow.
Also – OpenAI prefaced today’s announcement by highlighting that ChatGPT is in an ongoing shift from simple Q&A to being agentic, in others words, taking actions on your behalf. These updates—enabling ChatGPT to integrate directly with selected apps—are testing the waters on things like what’s useful, what’s safe and what people are comfortable with when it comes to agentic systems.
→ Read the full newsletter here
OpenAI brought o1 to the API, including support for new tools, and cheaper pricing.
Most importantly, o1 uses 60% fewer “thinking tokens” compared to the o1-preview model, making it much faster and cheaper.
Here’s the news tools now available in the o1 API:
Vision inputs (new): o1 can see and reason over images uploaded to it through the API. For example, imagine a user taking a picture at a manufacturing facility or in a lab setting, and having o1 give feedback – that’s the type of workflow which can now be built into AI applications.
Reasoning effort (new): Lets you tell the model how long to think before giving you a reply (saves time and money for easy prompts)
Developer messages (new): Lets developers tell the model which kind of instructions to follow and in what order e.g. tone, style and other behavioral guidance.
Function calling: Lets you connect o1 to third-party data and APIs.
Structured outputs: Gets you specific outputs where structured data is needed
OpenAI’s testing shows that function calling and structured outputs—and the combination of the two—works way better with o1 than GPT-4o; it does a much better job at calling the correct functions when it should.
If you want to see how the model and tools work together, check out OpenAI’s demo on using o1 to detect and correct errors in a tax form – combining vision input, function calling and structured outputs all at once.
o1 becoming simultaneously faster, better and cheaper is definitely an early x-mas gift for devs interested in building advanced AI applications. While O1-preview's limitations deterred many developers, the full O1 release makes advanced AI capabilities more accessible. This will likely accelerate the adoption of sophisticated AI features across a broader range of applications.
→ Read the full newsletter here
Pika just dropped a 2.0 upgrade of its AI video generator. The main update is that users can now upload and use their own images in generated videos, so called “Scene Ingredients”. You have control over characters, objects and backgrounds which can all be included in shots.
Sora—you got serious competition 🐰
Creating personalised videos just got a whole lot easier with Pika. This is a Christmas gift for content creators looking for solutions to scale their reach with AI.
→ Read the full newsletter here
Writer RAG tool: build production-ready RAG apps in minutes
RAG in just a few lines of code? We’ve launched a predefined RAG tool on our developer platform, making it easy to bring your data into a Knowledge Graph and interact with it with AI. With a single API call, writer LLMs will intelligently call the RAG tool to chat with your data.
Integrated into Writer’s full-stack platform, it eliminates the need for complex vendor RAG setups, making it quick to build scalable, highly accurate AI workflows just by passing a graph ID of your data as a parameter to your RAG tool.
THAT’S ALL FOLKS!
Was this email forwarded to you? Sign up here. Want to get in front of 13,000 AI enthusiasts? Work with me. This newsletter is written & curated by Dario Chincha. |
What's your verdict on today's email? |
Affiliate disclosure: To cover the cost of my email software and the time I spend writing this newsletter, I sometimes link to products and other newsletters. Please assume these are affiliate links. If you choose to subscribe to a newsletter or buy a product through any of my links then THANK YOU – it will make it possible for me to continue to do this.