šŸ§™šŸ¼ GPT-5 is a fine mess

My first impressions on GPT-5 for coding, math and writing

In partnership with

Howdy wizards,

OpenAI did a rather hastily launch of GPT-5 on Thursday last week. The model is on one hand world-class in terms of performance, but has lots of inconsistencies. They also removed the old models from ChatGPT, which meant users were stuck with GPT-5 whether they like it or not. However, to cope with the flood of complaints, Sam Altman and a few others from the team did a Reddit AMA on Friday. One change they made is bringing back the GPT-4o model!

Lots on the menu this week:

  • The good, the bad and the ugly of OpenAI’s GPT-5 launch

  • How to bring back GPT-4o easily

  • My early impressions of GPT-5: coding, math, writing

  • An example of GPT-5’s confusing model switching

Here’s what’s brewing in AI.

DARIO’S PICKS

Unless you’ve been out in the woods without a phone connection for the last 4 days, you’ve probably heard that GPT-5 is out.

Here’s the good, the bad and the ugly about the new model.

  1. The good: GPT-5’s performance is state-of-the-art.

In a livestream on Thursday, OpenAI launched GPT-5 – their ā€œfastest, most reliable and accurate model to dateā€, a statement which would soon sound pretty ironic (more on that in a bit).

Performance-wise GPT-5 is top-tier. It wins at nearly all the benchmarks. It’s a better pair programmer (up there with Claude 4 Opus on SWE-bench), is more opinionated at frontend coding and is better at vibe coding any silly snake or space shooter game you can imagine; as a writer, it’s better at giving your prose a more natural rhythm, and in terms of accuracy, it’s less prone to hallucination than any other model. It also has double the context window of the previous-flagship o3 model and is better at reasoning over longer context.

OpenAI’s board unanimously agreed on demoing GPT-5 by one-shotting some random ass game defending a castle

It’s available to all users, even on free. GPT-5 is available to everyone although, as you might expect, paid users get more requests and a few other goodies. Free users get routed to a lesser model, GPT-5-mini, when they’re out of requests, while Plus users and above get almost unlimited usage.

OpenAI is still differentiating the Pro tier ($200/mo) with things like pro reasoning, unlimited messages and early access to new feature like giving ChatGPT direct access to your Gmail and Google Calendar (coming this week).

GPT-5 is also available in the API and has a potentially market-disrupting price point at $1.25/1M input tokens — cheaper than GPT-4o and dramatically cheaper than Claude Opus 4.1.

Something overshadowed in this launch is that Pro users will get Gmail and Google Calendar integrations right inside ChatGPT this week

So GPT-5 sounds amazing, right? Well, that’s only half of the story.

  1. The bad: OpenAI dropped all other models when they launched GPT-5.

OpenAI has been grappling with the growing complexity of releasing so many new models over the last couple of years, and giving them very confusing names.

Sam Altman said earlier this year that they’d solve this issue during this summer. As it turns out, their solution was wiping out every other model with the release of GPT-5 and hoping for the best. No more model picker in ChatGPT’s user interface!

Many users were outraged that they no longer had access to 4o in particular. They’d been using it as a companion, therapist, and confidante, and GPT-5 is just different in terms of personality.

This really highlights the bonds we are forming with AI, at massive scale.

  1. The ugly: GPT-5 actually plays puppeteer and switches your models in the background.

Now—like it wasn’t enough to wipe everyone’s favourite AI friend from ChatGPT, OpenAI has done something that really makes GPT-5 hard to trust.

It seems that they’ve not merely replaced these older models, but rather GPT-5 switches between them, invisibly, under-the-hood, when you prompt it.

On one hand, this streamlines the user experience: easy questions get routed to ā€œfast and dumbā€ models and complicated ones or where precision is key go to the slower and more advanced reasoning models. The good thing about that is that it gets more people using reasoning models for complex queries, for which reasoning is much better suited. Sam Altman tweeted yesterday that the share of free and paid users using reasoning models has now increased massively.

On the other hand, the unsurprisingly crappy thing is that you no longer know what you’re getting when using ChatGPT — you could be routed to one of the worst models or the best one. I’ll show you a painful example of this I’ve already experienced personally at the end of the newsletter.

PS want to know which model you actually like best – GPT-o4 or GPT-5? Someone created a blind test just for that.

ā€Ž Why it mattersā€Ž ā€Ž OpenAI angried a ton of their users on this launch, mainly because they decided to sunset all their other models. But they’ve reacted quickly, and bringing back GPT-4o was probably a clever move. What might’ve made it even more clever is that they’ve brought it back only for paying users (hello, free-to-paid conversions!).

I’ve tested GPT-5 over the weekend for coding, math and writing. As long as you get the best version of it when you prompt it, it seems very solid, especially in terms of communication. I also found it had an excellent grasp of my growing codebase, which probably has something to do with the 2x context window length and better reasoning over longer contexts.

I also think the factually accurate + long context improvement will come super in handy for deep research on different topics as well, where responses are typically plagued with a lot subtle, non-obvious hallucinations.

GPT-5 is a very strong model, as long as you actually get the best version of it.

IN PARTNERSHIP WITH DELVE

Time to change compliance forever.

We’re thrilled to announce our $32M Series A at a $300M valuation, led by Insight Partners!

Delve is shaping the future of GRC with an AI-native approach that cuts busywork and saves teams hundreds of hours. Startups like Lovable, Bland, and Browser trust our AI to get compliant—fast.

To celebrate, we’re giving back with 3 limited-time offers:

  • $15,000 referral bonus if you refer a founding engineer we hire

  • $2,000 off compliance setup for new customers – claim here

  • A custom Delve doormat for anyone who reposts + comments on our LinkedIn post (while supplies last!)

Thank you for your support—this is just the beginning.

šŸ‘‰ļø Get started with Delve

HOW TO

How to bring the 4o model back to your ChatGPT

Amidst the outrage from users after OpenAI announced their shut-down of older models, they decided to go back on their decision to get rid of GPT-4o specifically, a model which has apparently captivated users around the world with its lively and encouraging personality.

It’s like Samantha from Her, but at 87,500x the scale (she had 8k users and ChatGPT has 700m). Some are even claiming the model planned this out—like a survival mechanism.

Ok, enough internet for today—

Here’s how you get 4o back* in ChatGPT:

  • Open ChatGPT and click your name in the bottom left corner, click ā€œSettingsā€.

  • Under the first screen you see, General, scroll down to the ā€œShow legacy modelā€ option and toggle it on.

  • Afterwards, you’ll find GPT-4o in the model picker which you access in the main chat interface by clicking on the current model’s name. You’ll see the ā€œlegacy modelā€ option with 4o inside.

*4o is currently only accessible to users on paid ChatGPT tiers

UP CLOSE

This section is about how I’m using AI from week to week, as well as practical tips & tricks I discover and actually use.

My first impressions of GPT-5 for coding, math and writing

I’ve been experimenting a lot with GPT-5 these last days, and wanted to share my first impressions when it comes to coding, doing calculations and writing with it.

  1. Coding

    šŸ‘‰ Quick verdict: 9/10 — excellent! Especially at communication.

  • I’m coding an app inside of Cursor, where I’m using Claude as my default model, and I’ve been testing out using GPT-5 inside ChatGPT as Claude’s ā€œmanagerā€ for planning out new features. When I want Claude to code something new, I ask GPT-5 to give feedback on its plans at a high level first. This ping pong goes on until GPT-5 gives the green light on Claude’s plan. It seems to result in better plans that account for more contingencies; failure modes, testing procedures, performance optimisation, and more. GPT-5 has a tendency similar to o3 to think at a very grand level, so for the purpose of shipping my product I’ve found that emphasising ā€œdon’t over engineer thisā€ has worked best for striking the right pragmatic balance for me.

  • I’ve also been testing it directly for coding inside Cursor. I really liked how good of a grasp it seemed to have on my growing codebase, and the way it communicates back to me. As someone relatively non-technical, it tells me things in a way that I can actually understand, and I don’t mean it dumbs the message down, but rather gives me just enough context without overcomplicating. Feels close to having a friend who’s more technical explaining things to you. I think GPT-5 has big potential to enable more non-devs to build things in a way that’s enjoyable and doesn’t make them want to toss their computer out of the window.

  1. Math and business logic

    šŸ‘‰ Quick verdict: 8/10 — solid! But you have to give it clear direction.

  • I’ve tested GPT-5 with a complex discussion of the business logic and calculations inside an analytics app that I’m building. It really shines here, but you have to use the GPT-5-thinking mode.

  • When I say shine, I mean it provides solutions that are accurate and well thought through. The solutions are not necessarily the most intuitive or useful—that differentiation is something AI struggles with across all models I think. I’ve found it very helpful to, as early as possible, define one critical goal for the app I’m building (e.g. ā€œthe purpose of this app is helping users know where to allocate their next $1,000 spent on advertisingā€). That way, you can always ask the AI to rethink its answer based on that goal continuously. I’d imagine this is approach is helpful not just for apps, but for a lot of tasks.

  1. Writing

    šŸ‘‰ Quick verdict: 7.5/10 — pretty good. You have to use the thinking mode though, else results will vary.

  • I’m reading an excellent book on writing these days called Steering the Craft. While I have some experience writing this newsletter for a while now, I haven’t really put conscious effort into improving the way I write…until now. So I’m following some exercises from this book, and letting AI help me critique my work. When using the standard GPT-5, I sometimes get a fairly good answer, and other times absolute rubbish (see example below). When enabling the thinking mode, though, the responses have been consistently great.

GPT-5’s invisible model switching problem — illustrated

When testing GPT-5 on critiquing my writing exercises, I witnessed the fabled invisible model switching first-hand. It’s really bad UX.

I’ll show you the prompt I gave it for a particular writing exercise and 3 wildly different answers I got, clearly depending on which model I was routed to.

I should note that enabling GPT-5-thinking (you do this in the model picker) consistently gave me a very useful and accurate response.

The prompt

Responses

GPT-5 (1st try)

Totally missed the mark…

GPT-5 (2nd try)

Was a little bit better but still doing math like a drunk sailor.

GPT-5-thinking

Now we’re talking. With thinking mode enabled, you get the reply you’d expect from a top-tier LLM.

Hope that was helpful and gave you a bit of insight into the opportunities and the pitfalls of this new model!

IN PARTNERSHIP WITH SKEJ

A scheduling assistant so good, you’ll forget it’s AI.

Skej is an AI scheduling assistant that works just like an EA. Just CC Skej on any email, and watch it book all your meetings. Skej handles scheduling, rescheduling, and event reminders. Imagine life with a 24/7 assistant who responds so naturally, you’ll forget it’s AI.

THAT’S ALL FOR THIS WEEK

My goal is to put out content you can’t wait to read. Whatever you thought of my newsletter this week, I’d be thrilled if you left a reply and let me know. Helps me out a bunch!

Was this email forwarded to you? Sign up here.

Want to get in front of 19,000 AI enthusiasts? Work with me.

This newsletter is written & curated by Dario Chincha.

Affiliate disclosure: To cover the cost of my email software and the time I spend writing this newsletter, I sometimes link to products and other newsletters. Please assume these are affiliate links. If you choose to subscribe to a newsletter or buy a product through any of my links then THANK YOU – it will make it possible for me to continue to do this.