- What's brewing in AI
- Posts
- š§š¼ GPT-5 is a fine mess
š§š¼ GPT-5 is a fine mess
My first impressions on GPT-5 for coding, math and writing
Howdy wizards,
OpenAI did a rather hastily launch of GPT-5 on Thursday last week. The model is on one hand world-class in terms of performance, but has lots of inconsistencies. They also removed the old models from ChatGPT, which meant users were stuck with GPT-5 whether they like it or not. However, to cope with the flood of complaints, Sam Altman and a few others from the team did a Reddit AMA on Friday. One change they made is bringing back the GPT-4o model!
Lots on the menu this week:
The good, the bad and the ugly of OpenAIās GPT-5 launch
How to bring back GPT-4o easily
My early impressions of GPT-5: coding, math, writing
An example of GPT-5ās confusing model switching
Hereās whatās brewing in AI.

DARIOāS PICKS
Unless youāve been out in the woods without a phone connection for the last 4 days, youāve probably heard that GPT-5 is out.
Hereās the good, the bad and the ugly about the new model.
The good: GPT-5ās performance is state-of-the-art.
In a livestream on Thursday, OpenAI launched GPT-5 ā their āfastest, most reliable and accurate model to dateā, a statement which would soon sound pretty ironic (more on that in a bit).
Performance-wise GPT-5 is top-tier. It wins at nearly all the benchmarks. Itās a better pair programmer (up there with Claude 4 Opus on SWE-bench), is more opinionated at frontend coding and is better at vibe coding any silly snake or space shooter game you can imagine; as a writer, itās better at giving your prose a more natural rhythm, and in terms of accuracy, itās less prone to hallucination than any other model. It also has double the context window of the previous-flagship o3 model and is better at reasoning over longer context.
Itās available to all users, even on free. GPT-5 is available to everyone although, as you might expect, paid users get more requests and a few other goodies. Free users get routed to a lesser model, GPT-5-mini, when theyāre out of requests, while Plus users and above get almost unlimited usage.
OpenAI is still differentiating the Pro tier ($200/mo) with things like pro reasoning, unlimited messages and early access to new feature like giving ChatGPT direct access to your Gmail and Google Calendar (coming this week).
GPT-5 is also available in the API and has a potentially market-disrupting price point at $1.25/1M input tokens ā cheaper than GPT-4o and dramatically cheaper than Claude Opus 4.1.
So GPT-5 sounds amazing, right? Well, thatās only half of the story.
The bad: OpenAI dropped all other models when they launched GPT-5.
OpenAI has been grappling with the growing complexity of releasing so many new models over the last couple of years, and giving them very confusing names.
Sam Altman said earlier this year that theyād solve this issue during this summer. As it turns out, their solution was wiping out every other model with the release of GPT-5 and hoping for the best. No more model picker in ChatGPTās user interface!
Many users were outraged that they no longer had access to 4o in particular. Theyād been using it as a companion, therapist, and confidante, and GPT-5 is just different in terms of personality.
This really highlights the bonds we are forming with AI, at massive scale.
The ugly: GPT-5 actually plays puppeteer and switches your models in the background.
Nowālike it wasnāt enough to wipe everyoneās favourite AI friend from ChatGPT, OpenAI has done something that really makes GPT-5 hard to trust.
It seems that theyāve not merely replaced these older models, but rather GPT-5 switches between them, invisibly, under-the-hood, when you prompt it.
On one hand, this streamlines the user experience: easy questions get routed to āfast and dumbā models and complicated ones or where precision is key go to the slower and more advanced reasoning models. The good thing about that is that it gets more people using reasoning models for complex queries, for which reasoning is much better suited. Sam Altman tweeted yesterday that the share of free and paid users using reasoning models has now increased massively.
On the other hand, the unsurprisingly crappy thing is that you no longer know what youāre getting when using ChatGPT ā you could be routed to one of the worst models or the best one. Iāll show you a painful example of this Iāve already experienced personally at the end of the newsletter.
PS want to know which model you actually like best ā GPT-o4 or GPT-5? Someone created a blind test just for that.
ā Why it mattersā ā OpenAI angried a ton of their users on this launch, mainly because they decided to sunset all their other models. But theyāve reacted quickly, and bringing back GPT-4o was probably a clever move. What mightāve made it even more clever is that theyāve brought it back only for paying users (hello, free-to-paid conversions!).
Iāve tested GPT-5 over the weekend for coding, math and writing. As long as you get the best version of it when you prompt it, it seems very solid, especially in terms of communication. I also found it had an excellent grasp of my growing codebase, which probably has something to do with the 2x context window length and better reasoning over longer contexts.
I also think the factually accurate + long context improvement will come super in handy for deep research on different topics as well, where responses are typically plagued with a lot subtle, non-obvious hallucinations.
GPT-5 is a very strong model, as long as you actually get the best version of it.

IN PARTNERSHIP WITH DELVE
Time to change compliance forever.
Weāre thrilled to announce our $32M Series A at a $300M valuation, led by Insight Partners!
Delve is shaping the future of GRC with an AI-native approach that cuts busywork and saves teams hundreds of hours. Startups like Lovable, Bland, and Browser trust our AI to get compliantāfast.
To celebrate, weāre giving back with 3 limited-time offers:
$15,000 referral bonus if you refer a founding engineer we hire
$2,000 off compliance setup for new customers ā claim here
A custom Delve doormat for anyone who reposts + comments on our LinkedIn post (while supplies last!)
Thank you for your supportāthis is just the beginning.
šļø Get started with Delve

HOW TO
How to bring the 4o model back to your ChatGPT
Amidst the outrage from users after OpenAI announced their shut-down of older models, they decided to go back on their decision to get rid of GPT-4o specifically, a model which has apparently captivated users around the world with its lively and encouraging personality.
Itās like Samantha from Her, but at 87,500x the scale (she had 8k users and ChatGPT has 700m). Some are even claiming the model planned this outālike a survival mechanism.
Ok, enough internet for todayā
Hereās how you get 4o back* in ChatGPT:
Open ChatGPT and click your name in the bottom left corner, click āSettingsā.
Under the first screen you see, General, scroll down to the āShow legacy modelā option and toggle it on.

Afterwards, youāll find GPT-4o in the model picker which you access in the main chat interface by clicking on the current modelās name. Youāll see the ālegacy modelā option with 4o inside.

*4o is currently only accessible to users on paid ChatGPT tiers

UP CLOSE
This section is about how Iām using AI from week to week, as well as practical tips & tricks I discover and actually use.
My first impressions of GPT-5 for coding, math and writing
Iāve been experimenting a lot with GPT-5 these last days, and wanted to share my first impressions when it comes to coding, doing calculations and writing with it.
Coding
š Quick verdict: 9/10 ā excellent! Especially at communication.
Iām coding an app inside of Cursor, where Iām using Claude as my default model, and Iāve been testing out using GPT-5 inside ChatGPT as Claudeās āmanagerā for planning out new features. When I want Claude to code something new, I ask GPT-5 to give feedback on its plans at a high level first. This ping pong goes on until GPT-5 gives the green light on Claudeās plan. It seems to result in better plans that account for more contingencies; failure modes, testing procedures, performance optimisation, and more. GPT-5 has a tendency similar to o3 to think at a very grand level, so for the purpose of shipping my product Iāve found that emphasising ādonāt over engineer thisā has worked best for striking the right pragmatic balance for me.
Iāve also been testing it directly for coding inside Cursor. I really liked how good of a grasp it seemed to have on my growing codebase, and the way it communicates back to me. As someone relatively non-technical, it tells me things in a way that I can actually understand, and I donāt mean it dumbs the message down, but rather gives me just enough context without overcomplicating. Feels close to having a friend whoās more technical explaining things to you. I think GPT-5 has big potential to enable more non-devs to build things in a way thatās enjoyable and doesnāt make them want to toss their computer out of the window.
Math and business logic
š Quick verdict: 8/10 ā solid! But you have to give it clear direction.
Iāve tested GPT-5 with a complex discussion of the business logic and calculations inside an analytics app that Iām building. It really shines here, but you have to use the GPT-5-thinking mode.
When I say shine, I mean it provides solutions that are accurate and well thought through. The solutions are not necessarily the most intuitive or usefulāthat differentiation is something AI struggles with across all models I think. Iāve found it very helpful to, as early as possible, define one critical goal for the app Iām building (e.g. āthe purpose of this app is helping users know where to allocate their next $1,000 spent on advertisingā). That way, you can always ask the AI to rethink its answer based on that goal continuously. Iād imagine this is approach is helpful not just for apps, but for a lot of tasks.
Writing
š Quick verdict: 7.5/10 ā pretty good. You have to use the thinking mode though, else results will vary.
Iām reading an excellent book on writing these days called Steering the Craft. While I have some experience writing this newsletter for a while now, I havenāt really put conscious effort into improving the way I writeā¦until now. So Iām following some exercises from this book, and letting AI help me critique my work. When using the standard GPT-5, I sometimes get a fairly good answer, and other times absolute rubbish (see example below). When enabling the thinking mode, though, the responses have been consistently great.
GPT-5ās invisible model switching problem ā illustrated
When testing GPT-5 on critiquing my writing exercises, I witnessed the fabled invisible model switching first-hand. Itās really bad UX.
Iāll show you the prompt I gave it for a particular writing exercise and 3 wildly different answers I got, clearly depending on which model I was routed to.
I should note that enabling GPT-5-thinking (you do this in the model picker) consistently gave me a very useful and accurate response.
The prompt

Responses
GPT-5 (1st try)
Totally missed the markā¦

GPT-5 (2nd try)
Was a little bit better but still doing math like a drunk sailor.

GPT-5-thinking
Now weāre talking. With thinking mode enabled, you get the reply youād expect from a top-tier LLM.

Hope that was helpful and gave you a bit of insight into the opportunities and the pitfalls of this new model!

IN PARTNERSHIP WITH SKEJ
A scheduling assistant so good, youāll forget itās AI.
Skej is an AI scheduling assistant that works just like an EA. Just CC Skej on any email, and watch it book all your meetings. Skej handles scheduling, rescheduling, and event reminders. Imagine life with a 24/7 assistant who responds so naturally, youāll forget itās AI.

THATāS ALL FOR THIS WEEK
My goal is to put out content you canāt wait to read. Whatever you thought of my newsletter this week, Iād be thrilled if you left a reply and let me know. Helps me out a bunch!
Was this email forwarded to you? Sign up here. Want to get in front of 19,000 AI enthusiasts? Work with me. This newsletter is written & curated by Dario Chincha. |
Affiliate disclosure: To cover the cost of my email software and the time I spend writing this newsletter, I sometimes link to products and other newsletters. Please assume these are affiliate links. If you choose to subscribe to a newsletter or buy a product through any of my links then THANK YOU ā it will make it possible for me to continue to do this.