- What's brewing in AI
- Posts
- š§š¼ It's a gem!
š§š¼ It's a gem!
Google drops Gemini 3. The benchmarks are crying.
Itās a gem!
Google drops Gemini 3. The benchmarks are crying.
Was this email forwarded to you? Sign up here.

Many expected Google was brewing up a new model release, but few anticipated how hard Sundar & Co were cooking.
Here's what you need to know about Gemini 3, and my verdict after putting it through its paces.
In this email:
Gemini 3 crushes the benchmarks
My test of Gemini 3 for writing (yes, it beats GPT-5)
The proverbial cat is out of the bagāGoogle launched Gemini 3 this week, and it took the number one spot across benchmarks, by margins that few saw coming.
Gemini 3 is Googleās most advanced model yet. It can remember a lot of context, and has some new superpowers that let it activate only a subset (rather than all) of its parameters for each taskāthis sparse mixture-of-experts approach delivers frontier-level reasoning at mid-tier pricing.
A standout feature of Gemini 3 is that itās better at figuring out the intent and context behind what you ask it. It brings you closer to a desired result with less prompting. Like Google puts it, AI has evolved from simply reading text and images to reading the room.
āAI has evolved from simply reading text and images to reading the roomā
IN PARTNERSHIP WITH INTERCOM
Startups get Intercom 90% off and Fin AI agent free for 1 year
Join Intercomās Startup Program to receive a 90% discount, plus Fin free for 1 year.
Get a direct line to your customers with the only complete AI-first customer service solution.
Itās like having a full-time human support agent free for an entire year.
The benchmarks table below tells a clear story. Gemini 3 doesnāt just edge out the state of the art, it jumps ahead on the hardest tests.
Some notable examples:
Massive improvement on Humanityās Last Exam; Gemini 3 scores 37.5%, 10 points above GPT-5.1 and 24 above Claude.
Hefty gains on math benchmarks; scores 23.4% on MathArena Apex. Claude and GPT-5.1 have less than 2% (in other words, a 10x leap).
Strong in understanding user interfaces in apps; scores 73% on ScreenSpot-Pro. Previous high score was less than half.
Probably the best coding model so far with 2,439 Elo on LiveCodeBench. It blows Claude out of the water on this one and marginally beats GPT-5.1. However, on SWE-Bench Claude holds a slight lead and GPT-5.1 is in the same ballpark, so itās not really a clean sweep on coding.
31% on ARC-AGI-2 (reasoning puzzles). No other model had passed 20% until now.
Real-world business skills (my personal favourite): On Vending Bench 2, a benchmark that lets different AIs go wild running a vending machine business, Gemini 3 made over $5,000 in a year of simulated time. Claude earned around $4,000, and GPT-5.1 only $1,500. This isn't abstract reasoning; it's practical decision-making under real constraints.
Overall, the new Gemini is dramatically stronger than Claude 4.5 on pretty much every benchmark. Versus GPT-5.1, thereās less of a difference in performance but still very significant.
Gemini is rolling out everywhere Google can, including in AI mode when you put in a Google search. Itās also accessible through the Gemini App, the API, AI Studio, Vertex and the brand new Antigravity IDE (more about Antigravity further down).
Why it matters Google has been patiently playing the long-game in AI. Theyāve recently had the viral launch of Nano Banana, and a series of serious yet quiet improvements to things like AI in Search, Chrome, Workspace apps, Maps, even hardware. And now, their crown jewel, Gemini 3.
Google is building unmatched control across the value chain of hardware, models, dev tools, distribution and end-user apps.
My test of Gemini 3 for writing
I took Gemini 3 for a little test run for the same writing task I did with GPT-5 when it came out.
This simple test lets me do a rough test of image understanding, math, reasoning and vibes of the model.

The text is a story of 89 words, only sentences with 7 or fewer words and a single sentence that is fragmented.
I explain to the model that the exercise was to write a text of 100-150 words, max sentence length of 7 words and no sentence fragments allowed.
Then I ask it to check my writing against the exerciseās constraints.
Gemini 2.5 Fast (for comparison)
If you open the Gemini app right now, youāll see two options: Fast and Thinking.
The āFastā mode, which is easy to assume is Gemini 3, is actually Gemini 2.5 Flash under the hood.
I used it as a contrast to see how much better Gemini 3 is afterward.

Gemini 2.5 Flash (aka Gemini Fast in the app)
As you can see, Gemini Fast got most of the things wrong in this exercise, similar to when I tried it for GPT-5 Instant.
These āfastā models are generally quite bad at math and they hallucinateāyet serve everything neatly in a table to you and present it as truth. Beware.
Fortunately, my experience with Gemini 3 was far betterā¦
Gemini 3 Thinking gets it right

The āThinkingā mode, on the other hand, does the new model justice. Iād say the output is better than what I got when I tried the same with GPT-5 Thinking.
Gemini 3 correctly calculated total and sentence word count, and flagged the only sentence fragment present in the original text. Additionally, it gave me a single tip to improve punctuation (not a mistake, but a meaningful style improvement), which I appreciated.
Overall, it nailed the exercise and Iām now thinking about how to make it more difficult for the next model release (feel free to reply to this email with your suggestions!).
Google also launched Antigravityātheir answer to Cursor. I'm testing it on a real project right now. My early take: Cursor has real competition now.
Stay tuned for a full breakdown coming soon.
IN PARTNERSHIP WITH RUBRIK
Here's what changed: 82% of cyberattacks now target cloud environments. If you're running AI workloads thereātraining data, model storage, agent deploymentsāyou just gave attackers more entry points.
Join IT and security leaders on December 10 who've actually dealt with this. Learn recovery strategies for bouncing back in hours, not weeks, when things go wrong.

THATāS ALL FOR THIS WEEK
Friends, Indeed, Gemini 3 is new and exciting. It doesn't mean you need to drop everything and use it today. There'll be another game-changing release next week, and the week after that, and⦠Use and test it if it makes sense for you. Otherwise, itās perfectly fine to just filter it out. | ![]() |
Was this email forwarded to you? Sign up here. Want to get in front of 21,000+ AI builders and enthusiasts? Work with me. This newsletter is written & shipped by Dario Chincha. |
Affiliate disclosure: To cover the cost of my email software and the time I spend writing this newsletter, I sometimes link to products and other newsletters. Please assume these are affiliate links. If you choose to subscribe to a newsletter or buy a product through any of my links then THANK YOU ā it will make it possible for me to continue to do this.





