What's brewing in AI
Posts
🧙🏼 ChatGPT gets live video + screen sharing

🧙🏼 ChatGPT gets live video + screen sharing

Also: Gemini 2.0 and its new superpowers

Dario Chincha
December 13, 2024

Howdy wizards,

In case you missed it, iOS 18.2 with Apple Intelligence is officially out – including the ChatGPT integration in Siri. Owners of a relatively-new iPhone, iPad, and Mac can upgrade and start using it.

Here’s what’s brewing in AI today.

DARIO’S PICKS

1. ChatGPT’s Advanced Voice Mode gets vision

OpenAI is bringing live video + screensharing to Advanced Voice Mode. This means you can share real-time visual context with ChatGPT to make it more useful.

The live video functionality lets you share context from your back or front camera on your phone in real-time with ChatGPT. Here’s a demo of using it to learn how to make pour-over coffee.
The screenshare functionality lets you broadcast your phone’s screen to ChatGPT while you’re using Advanced Voice mode. Here’s a demo of using it to help respond to a message.

Plus & Pro users will have access within the week in the ChatGPT mobile app – with the exception of EU users which will launch “as soon as we can”.

‎ Why it matters‎ ‎ This will open new use cases for ChatGPT in real-time: think guided tasks, technical support, “explain what you see”, and similar.

I’m equally if not more excited about screen-sharing as I am about live video, though. And once it gets desktop support? That will be powerful for work.

TOGETHER WITH BELAY

Accomplish More. Juggle Less.

When you love what you do, it can be easy to take on more — more tasks, more deadlines, more hours – but before you know it, you don’t have time to do what you loved in the beginning. Don’t just do more – do more of what you do best.

BELAY’s flexible staffing solutions leverage industry experience with AI systems to increase productivity without sacrificing quality. You can accomplish more and juggle less with our exceptional U.S.-based Virtual Assistants, Accounting Professionals, and Marketing Assistants. Learn how with our free ebook, Delegate to Elevate, and leave the more to BELAY.

Try Belay

2. Google launches Gemini 2.0

OpenAI weren’t the only ones with holiday surprises this week: Google launched Gemini 2.0 yesterday – its new flagship AI model designed for the “agentic era.”

Gemini 2.0 Flash is both better and cheaper than it’s bigger, older brother Gemini 1.5 Pro. It also has real-time capabilities — text, voice, video, and even screen-sharing — all at once. It can also generate images, handle multiple languages in audio output, and process text, code, images, and video seamlessly.
A new Multimodal Live API allows Gemini to do real-time video and screen sharing, as well as real-time audio. You test it yourself inside AI studio.
Available in Gemini Advanced, Deep Research is an agentic feature that can do powerful reasoning; it can do more sophisticated problem-solving and offers better support for complex, long-context queries. It can also gather info from around the web, like scanning dozens of websites.

If you want to see it in action, this tweet has demos of people using the new Gemini – including the screen-sharing features.

Google also let us in on a sneak peak on its work on AI agents:

Project Mariner: An early prototype of a browser-based agent that can complete tasks. Sounds like an agentic assistant similar to Claude’s Computer Use and what OpenAI might also launch soon.
Project Astra: A prototype for a general assistant with better memory, tool access (can use Google Search, Lens and Maps), and conversational abilities.
Jules: Currently only available to a group of early testers, Jules is an experimental code agent that integrates directly into Github and can assist developers directly in their workflow.
Agents in Games: Google is developing new gaming-focused agents that understand and guide gameplay in real time.

‎ Why it matters‎ ‎ With Google’s Multimodal Live API, you can actually share your desktop screen with Gemini. Could be just me but I generally don’t use Gemini as I feel its less reliable and helpful than ChatGPT and Claude, but the feature itself is pretty cool.

Also, with Project Mariner, we now have OpenAI, Anthropic—and Google—openly going big on browser-based agentic capabilities, albeit none have launched this feature for the masses yet. That’s likely to change in the near future, though, with agents expected to be the thing in AI for 2025.

FROM OUR PARTNERS

Build Smarter, Faster: AI Voice Agents for Every Industry

Dream of a calling assistant that works tirelessly, taking calls 24/7 and managing tasks like real-time booking and lead qualification? With Synthflow’s collection of AI Agent templates, tailored to industries such as real estate and healthcare, you can launch your assistant fast. Plus, you can customize and publish your own templates, opening the door to earning commissions while helping others get started!

Build Your AI Agent Now

THAT’S ALL FOLKS!

Was this email forwarded to you? Sign up here.

Want to get in front of 13,000 AI enthusiasts? Work with me.

This newsletter is written & curated by Dario Chincha.

What's your verdict on today's email?

Affiliate disclosure: To cover the cost of my email software and the time I spend writing this newsletter, I sometimes link to products and other newsletters. Please assume these are affiliate links. If you choose to subscribe to a newsletter or buy a product through any of my links then THANK YOU – it will make it possible for me to continue to do this.