šŸ§™šŸ¼ ChatGPT gets live video + screen sharing

Also: Gemini 2.0 and its new superpowers

Howdy wizards,

In case you missed it, iOS 18.2 with Apple Intelligence is officially out ā€“ including the ChatGPT integration in Siri. Owners of a relatively-new iPhone, iPad, and Mac can upgrade and start using it.

Hereā€™s whatā€™s brewing in AI today.

DARIOā€™S PICKS

OpenAI is bringing live video + screensharing to Advanced Voice Mode. This means you can share real-time visual context with ChatGPT to make it more useful.

  • The live video functionality lets you share context from your back or front camera on your phone in real-time with ChatGPT. Hereā€™s a demo of using it to learn how to make pour-over coffee.

  • The screenshare functionality lets you broadcast your phoneā€™s screen to ChatGPT while youā€™re using Advanced Voice mode. Hereā€™s a demo of using it to help respond to a message.

Plus & Pro users will have access within the week in the ChatGPT mobile app ā€“ with the exception of EU users which will launch ā€œas soon as we canā€.

ā€Ž Why it mattersā€Ž ā€Ž This will open new use cases for ChatGPT in real-time: think guided tasks, technical support, ā€œexplain what you seeā€, and similar.

Iā€™m equally if not more excited about screen-sharing as I am about live video, though. And once it gets desktop support? That will be powerful for work.

TOGETHER WITH BELAY

When you love what you do, it can be easy to take on more ā€” more tasks, more deadlines, more hours ā€“ but before you know it, you donā€™t have time to do what you loved in the beginning. Donā€™t just do more ā€“ do more of what you do best.

BELAYā€™s flexible staffing solutions leverage industry experience with AI systems to increase productivity without sacrificing quality. You can accomplish more and juggle less with our exceptional U.S.-based Virtual Assistants, Accounting Professionals, and Marketing Assistants. Learn how with our free ebook, Delegate to Elevate, and leave the more to BELAY.

OpenAI werenā€™t the only ones with holiday surprises this week: Google launched Gemini 2.0 yesterday ā€“ its new flagship AI model designed for the ā€œagentic era.ā€

  • Gemini 2.0 Flash is both better and cheaper than itā€™s bigger, older brother Gemini 1.5 Pro. It also has real-time capabilities ā€” text, voice, video, and even screen-sharing ā€” all at once. It can also generate images, handle multiple languages in audio output, and process text, code, images, and video seamlessly.

  • A new Multimodal Live API allows Gemini to do real-time video and screen sharing, as well as real-time audio. You test it yourself inside AI studio.

  • Available in Gemini Advanced, Deep Research is an agentic feature that can do powerful reasoning; it can do more sophisticated problem-solving and offers better support for complex, long-context queries. It can also gather info from around the web, like scanning dozens of websites.

If you want to see it in action, this tweet has demos of people using the new Gemini ā€“ including the screen-sharing features.

Google also let us in on a sneak peak on its work on AI agents:

  • Project Mariner: An early prototype of a browser-based agent that can complete tasks. Sounds like an agentic assistant similar to Claudeā€™s Computer Use and what OpenAI might also launch soon.

  • Project Astra: A prototype for a general assistant with better memory, tool access (can use Google Search, Lens and Maps), and conversational abilities.

  • Jules: Currently only available to a group of early testers, Jules is an experimental code agent that integrates directly into Github and can assist developers directly in their workflow.

  • Agents in Games: Google is developing new gaming-focused agents that understand and guide gameplay in real time.

ā€Ž Why it mattersā€Ž ā€Ž With Googleā€™s Multimodal Live API, you can actually share your desktop screen with Gemini. Could be just me but I generally donā€™t use Gemini as I feel its less reliable and helpful than ChatGPT and Claude, but the feature itself is pretty cool.

Also, with Project Mariner, we now have OpenAI, Anthropicā€”and Googleā€”openly going big on browser-based agentic capabilities, albeit none have launched this feature for the masses yet. Thatā€™s likely to change in the near future, though, with agents expected to be the thing in AI for 2025.

FROM OUR PARTNERS

Instant Setup, Instant Results: Hire a Synthflow AI Agent Today

Your Next Best Hire: A Synthflow AI Voice Agent. With human-like interaction, it manages calls, qualifies leads, and more, 24/7. Cost-effective plans starting at $29/month, and integrates with top CRMs. Start your free trial and welcome your new team member!

THATā€™S ALL FOLKS!

Was this email forwarded to you? Sign up here.

Want to get in front of 13,000 AI enthusiasts? Work with me.

This newsletter is written & curated by Dario Chincha.

What's your verdict on today's email?

Login or Subscribe to participate in polls.

Affiliate disclosure: To cover the cost of my email software and the time I spend writing this newsletter, I sometimes link to products and other newsletters. Please assume these are affiliate links. If you choose to subscribe to a newsletter or buy a product through any of my links then THANK YOU ā€“ it will make it possible for me to continue to do this.