- What's brewing in AI
- Posts
- The Sunday recap✨
The Sunday recap✨
Your weekly AI catch-up is here
Howdy, wizards.
⏪ This is your weekly recap email on Sundays. All the best links I’ve shared during the week. No fluff or unnecessary details included.
🤷🏻♂️ Don’t need the Sunday recap? You can update your subscriber preferences at the end of this email to pick what emails you’d like to receive.
Let’s recap!
THE SUNDAY RECAP
THE MOST IMPORTANT DEVELOPMENTS IN AI THIS WEEK
As the title says—OpenAI launched Sora this week on Sora.com. It can do text to video, image to video and even video to video. They’ve launched not just a model here, but rather a studio for AI video production.
Key features:
The explore feed: Sora has its own public gallery of user-created videos that you can draw inspiration from or even remix. See featured, recent or saved videos.
Storyboard: Arrange and organize video sequences visually on a timeline.
Remix: Replace, remove, or re-imagine elements in your or other peoples’ videos to transform scenes and settings.
Re-cut: Find the best frames and adjust scene lengths for a polished final cut.
Loop: Create seamless, repeating video clips.
Blend: Merge two videos into one cohesive clip.
Style presets: Apply stylistic templates to give your videos a thematic look.
Sora is available to paid ChatGPT subscribers in many countries, but not most of Europe and the UK for now.
Sora’s interface offers a fully intuitive, studio-like experience—very reminiscent of ChatGPT, but for video content. Seems like the OpenAI team did a solid job on this one; Altman also made an analogy during the livestream of Sora being like “GPT-1”, meaning it’s likely to get a lot better with each upgrade.
→ Read the full newsletter here
TLDR covers the best tech, startup, and coding stories in a quick email that takes 5 minutes to read. And it's read by over 1.2 million people!
*sponsored
canvas is now available to all chatgpt users, and can execute code!
more importantly it can also still emojify your writing.
— Sam Altman (@sama)
6:48 PM • Dec 10, 2024
Day 4 of OpenAI's "shipmas" brought a major Canvas upgrade, making ChatGPT's split-screen mode a more powerful writing and coding companion.
Here’s what’s new:
Canvas is officially out of Beta and now available to anyone – even free users. It integrates natively with GPT-4o, so you can trigger it with a prompt instead of having to manual select a separate model for it.
There’s now a Python integration which lets ChatGPT execute code right inside Canvas. It even supports real-time debugging.
ChatGPT can now highlight parts of your writing inside Canvas and leave you feedback in a very smooth way—check out this demo to see it.
Custom GPTs can now use Canvas too! GPT builders can enable it in the GPT editor alongside DALL-E, code interpreter and web browsing.
Canvas will make it more intuitive to write and code with ChatGPT. The ability to run code also means they’re catching up to Claude, which has been able to do this for a while through the Artifacts feature.
→ Read the full newsletter here
OpenAI is bringing live video + screensharing to Advanced Voice Mode. This means you can share real-time visual context with ChatGPT to make it more useful.
The live video functionality lets you share context from your back or front camera on your phone in real-time with ChatGPT. Here’s a demo of using it to learn how to make pour-over coffee.
The screenshare functionality lets you broadcast your phone’s screen to ChatGPT while you’re using Advanced Voice mode. Here’s a demo of using it to help respond to a message.
Plus & Pro users will have access within the week in the ChatGPT mobile app – with the exception of EU users which will launch “as soon as we can”.
This will open new use cases for ChatGPT in real-time: think guided tasks, technical support, “explain what you see”, and similar.
I’m equally if not more excited about screen-sharing as I am about live video, though. And once it gets desktop support? That will be powerful for work.
→ Read the full newsletter here
OpenAI weren’t the only ones with holiday surprises this week: Google launched Gemini 2.0 yesterday – its new flagship AI model designed for the “agentic era.”
Gemini 2.0 Flash is both better and cheaper than it’s bigger, older brother Gemini 1.5 Pro. It also has real-time capabilities — text, voice, video, and even screen-sharing — all at once. It can also generate images, handle multiple languages in audio output, and process text, code, images, and video seamlessly.
A new Multimodal Live API allows Gemini to do real-time video and screen sharing, as well as real-time audio. You test it yourself inside AI studio.
Available in Gemini Advanced, Deep Research is an agentic feature that can do powerful reasoning; it can do more sophisticated problem-solving and offers better support for complex, long-context queries. It can also gather info from around the web, like scanning dozens of websites.
If you want to see it in action, this tweet has demos of people using the new Gemini – including the screen-sharing features.
Google also let us in on a sneak peak on its work on AI agents:
Project Mariner: An early prototype of a browser-based agent that can complete tasks. Sounds like an agentic assistant similar to Claude’s Computer Use and what OpenAI might also launch soon.
Project Astra: A prototype for a general assistant with better memory, tool access (can use Google Search, Lens and Maps), and conversational abilities.
Jules: Currently only available to a group of early testers, Jules is an experimental code agent that integrates directly into Github and can assist developers directly in their workflow.
Agents in Games: Google is developing new gaming-focused agents that understand and guide gameplay in real time.
With Google’s Multimodal Live API, you can actually share your desktop screen with Gemini. Could be just me but I generally don’t use Gemini as I feel its less reliable and helpful than ChatGPT and Claude, but the feature itself is pretty cool.
Also, with Project Mariner, we now have OpenAI, Anthropic—and Google—openly going big on browser-based agentic capabilities, albeit none have launched this feature for the masses yet. That’s likely to change in the near future, though, with agents expected to be the thing in AI for 2025.
→ Read the full newsletter here
Earlier this week, Meta released Llama 3.3, an open, 70B text model that rivals GPT-4o and Gemini Pro 1.5 on several benchmarks.
It performs similarly to the much bigger Llama 3.1 405B, with the important difference that it’s 10x cheaper than its predecessor, and 25x cheaper than GPT-4o.
According to Mark Zuckerberg, Meta AI is on track to becoming the most used AI assistant in the world with 600M active monthly users.
Zuck also said Llama 4 is currently being trained in their $10B data center in Louisiana and planned for release some time in 2025.
Meta is showing that open LLMs are up there in terms of performance with their closed, frontier-level counterparts—but way more affordable. This is good news for builders and users. There’s a pressure on prices building at the model layer of AI, and companies are increasingly differentiating in terms of user interface and features, rather than pure horsepower (think ChatGPT Pro which includes Sora now, or Perplexity which is going all out as a premier search engine).
Also, Meta AI now has—with some assumptions—roughly half the number of active users ChatGPT has (Altman cited 300M weekly actives last week). Surely, the growth has been helped a lot by Meta’s unparalleled distribution through Messenger, Insta and WhatsApp.
→ Read the full newsletter here
THAT’S ALL FOLKS!
Was this email forwarded to you? Sign up here. Want to get in front of 13,000 AI enthusiasts? Work with me. This newsletter is written & curated by Dario Chincha. |
What's your verdict on today's email? |
Affiliate disclosure: To cover the cost of my email software and the time I spend writing this newsletter, I sometimes link to products and other newsletters. Please assume these are affiliate links. If you choose to subscribe to a newsletter or buy a product through any of my links then THANK YOU – it will make it possible for me to continue to do this.