šŸ§™šŸ¼ It's a gem!

Google drops Gemini 3. The benchmarks are crying.

It’s a gem!

Google drops Gemini 3. The benchmarks are crying.

Was this email forwarded to you? Sign up here.

Many expected Google was brewing up a new model release, but few anticipated how hard Sundar & Co were cooking.

Here's what you need to know about Gemini 3, and my verdict after putting it through its paces.

In this email:

  • Gemini 3 crushes the benchmarks

  • My test of Gemini 3 for writing (yes, it beats GPT-5)

The proverbial cat is out of the bag—Google launched Gemini 3 this week, and it took the number one spot across benchmarks, by margins that few saw coming.

Gemini 3 is Google’s most advanced model yet. It can remember a lot of context, and has some new superpowers that let it activate only a subset (rather than all) of its parameters for each task—this sparse mixture-of-experts approach delivers frontier-level reasoning at mid-tier pricing.

A standout feature of Gemini 3 is that it’s better at figuring out the intent and context behind what you ask it. It brings you closer to a desired result with less prompting. Like Google puts it, AI has evolved from simply reading text and images to reading the room.

ā

ā€œAI has evolved from simply reading text and images to reading the roomā€

IN PARTNERSHIP WITH INTERCOM

Startups get Intercom 90% off and Fin AI agent free for 1 year

Join Intercom’s Startup Program to receive a 90% discount, plus Fin free for 1 year.

Get a direct line to your customers with the only complete AI-first customer service solution.

It’s like having a full-time human support agent free for an entire year.

The benchmarks table below tells a clear story. Gemini 3 doesn’t just edge out the state of the art, it jumps ahead on the hardest tests.

Benchmarks listed in Gemini 3’s model card

Some notable examples:

  • Massive improvement on Humanity’s Last Exam; Gemini 3 scores 37.5%, 10 points above GPT-5.1 and 24 above Claude.

  • Hefty gains on math benchmarks; scores 23.4% on MathArena Apex. Claude and GPT-5.1 have less than 2% (in other words, a 10x leap).

  • Strong in understanding user interfaces in apps; scores 73% on ScreenSpot-Pro. Previous high score was less than half.

  • Probably the best coding model so far with 2,439 Elo on LiveCodeBench. It blows Claude out of the water on this one and marginally beats GPT-5.1. However, on SWE-Bench Claude holds a slight lead and GPT-5.1 is in the same ballpark, so it’s not really a clean sweep on coding.

  • 31% on ARC-AGI-2 (reasoning puzzles). No other model had passed 20% until now.

  • Real-world business skills (my personal favourite): On Vending Bench 2, a benchmark that lets different AIs go wild running a vending machine business, Gemini 3 made over $5,000 in a year of simulated time. Claude earned around $4,000, and GPT-5.1 only $1,500. This isn't abstract reasoning; it's practical decision-making under real constraints.

Overall, the new Gemini is dramatically stronger than Claude 4.5 on pretty much every benchmark. Versus GPT-5.1, there’s less of a difference in performance but still very significant.

Gemini is rolling out everywhere Google can, including in AI mode when you put in a Google search. It’s also accessible through the Gemini App, the API, AI Studio, Vertex and the brand new Antigravity IDE (more about Antigravity further down).

Why it matters  Google has been patiently playing the long-game in AI. They’ve recently had the viral launch of Nano Banana, and a series of serious yet quiet improvements to things like AI in Search, Chrome, Workspace apps, Maps, even hardware. And now, their crown jewel, Gemini 3.

Google is building unmatched control across the value chain of hardware, models, dev tools, distribution and end-user apps.

My test of Gemini 3 for writing

I took Gemini 3 for a little test run for the same writing task I did with GPT-5 when it came out.

This simple test lets me do a rough test of image understanding, math, reasoning and vibes of the model.

The text is a story of 89 words, only sentences with 7 or fewer words and a single sentence that is fragmented.

I explain to the model that the exercise was to write a text of 100-150 words, max sentence length of 7 words and no sentence fragments allowed.

Then I ask it to check my writing against the exercise’s constraints.

Gemini 2.5 Fast (for comparison)

If you open the Gemini app right now, you’ll see two options: Fast and Thinking.

The ā€œFastā€ mode, which is easy to assume is Gemini 3, is actually Gemini 2.5 Flash under the hood.

I used it as a contrast to see how much better Gemini 3 is afterward.

Gemini 2.5 Flash (aka Gemini Fast in the app)

As you can see, Gemini Fast got most of the things wrong in this exercise, similar to when I tried it for GPT-5 Instant.

These ā€œfastā€ models are generally quite bad at math and they hallucinate—yet serve everything neatly in a table to you and present it as truth. Beware.

Fortunately, my experience with Gemini 3 was far better…

Gemini 3 Thinking gets it right

The ā€œThinkingā€ mode, on the other hand, does the new model justice. I’d say the output is better than what I got when I tried the same with GPT-5 Thinking.

Gemini 3 correctly calculated total and sentence word count, and flagged the only sentence fragment present in the original text. Additionally, it gave me a single tip to improve punctuation (not a mistake, but a meaningful style improvement), which I appreciated.

Overall, it nailed the exercise and I’m now thinking about how to make it more difficult for the next model release (feel free to reply to this email with your suggestions!).

Google also launched Antigravity—their answer to Cursor. I'm testing it on a real project right now. My early take: Cursor has real competition now.

Stay tuned for a full breakdown coming soon.

IN PARTNERSHIP WITH RUBRIK

Here's what changed: 82% of cyberattacks now target cloud environments. If you're running AI workloads there—training data, model storage, agent deployments—you just gave attackers more entry points.

Join IT and security leaders on December 10 who've actually dealt with this. Learn recovery strategies for bouncing back in hours, not weeks, when things go wrong.

THAT’S ALL FOR THIS WEEK

Friends,

Indeed, Gemini 3 is new and exciting. It doesn't mean you need to drop everything and use it today.

There'll be another game-changing release next week, and the week after that, and…

Use and test it if it makes sense for you.

Otherwise, it’s perfectly fine to just filter it out.

Was this email forwarded to you? Sign up here.

Want to get in front of 21,000+ AI builders and enthusiasts? Work with me.

This newsletter is written & shipped by Dario Chincha.

Affiliate disclosure: To cover the cost of my email software and the time I spend writing this newsletter, I sometimes link to products and other newsletters. Please assume these are affiliate links. If you choose to subscribe to a newsletter or buy a product through any of my links then THANK YOU – it will make it possible for me to continue to do this.