What's brewing in AI
Posts
🧙🏼 OpenAI's DevDay launches

🧙🏼 OpenAI's DevDay launches

Also: AI in logistics, 3 real-world use cases

Dario Chincha
October 02, 2024

Howdy, wizards.

OpenAI’s DevDay was a lot less surrounded by hype compared to last year, with Sam Altman being MIA from the event. But, the launches are still massive in terms of impact.

Let’s get into the nitty-gritty of what they announced.

DARIO’S PICKS

1. OpenAI’s DevDay: everything they launched

OpenAI held their much-awaited DevDay event yesterday, and had some exciting news for everyone building with AI:

Real-time API: OpenAI is enabling speech-to-speech through their API, the same tech that powers ChatGPT’s Advanced Voice Mode. That means developers can now start creating similar, more natural voice interactions inside their apps. The API will also support function-calling, which means these apps will have the ability to take actions in the real-world as well.
- A demo of how this could work was shown with Healthify and their AI coach Ria (screenshot below). The user asked for lunch tips and it pulled up a separate screen with food recommendations during the conversation.

Source: OpenAI

Vision in the fine-tuning API: It’s now possible to fine-tune with images, in addition to text. The enhanced image understanding will allow developers to build better versions of things like visual search, object detection, and image analysis. A couple of examples of companies who have been testing the feature:
- Grab (like Uber, but for Southeast Asia) uses street-level images from their drivers to power GrabMaps. They used fine-tuned GPT4-o with only 100 examples to refine the mapping data, which made their model more capable in counting lanes and identifying speed limit signs.
- Automat, a company doing RPA (Robotic Process Automation) builds agents that take actions on your computer to automate processes. With vision fine-tuning and a dataset of screenshots, they made GPT-4o much better at identifying UI elements on the screen (buttons, sliders, text fields, etc) based on natural language descriptions.
Prompt caching in the API: OpenAI is following the footsteps of Google and Anthropic, who released prompt caching in their APIs earlier this year. Prompt caching means they can now save frequently used context (e.g. uploaded documents, a codebase, any knowledge base) and access it quicker and cheaper.
Model distillation in the API: It’s now easy to use outputs of frontier models like GPT4-o and o1 to fine-tune smaller, cost-efficient models like GPT-4o mini. This allows developers to cut cost while maintaining high performance.

‎ Why it matters‎ ‎ The wold can now build apps with real-time two way voices. And there’s possible cost savings for many apps, especially if you don’t need frontier model capability. The vision fine-tuning is also powerful and is going to open new use cases; for example, it would make it easier to create apps that turns designs into coded websites/apps.

TOGETHER WITH TELLO

Tired of overpriced phone plans that charge you for what you don’t need?

With Tello Mobile, you can say goodbye to overpriced contracts and hello to freedom. Their flexible, affordable options start as low as $5 and go up to $25/month for Unlimited Everything, allowing you to customize each plan to suit your family's exact requirements.

Whether you're looking for reliable 4G LTE/5G coverage, Wi-Fi calling, free international calls to 60+ countries, or unlimited texts, Tello has you covered. And with no contracts or hidden fees, you'll enjoy peace of mind knowing that you're getting exactly what you pay for.

Bring your own phone or explore our selection of devices to find the perfect fit for you. Stop settling for expensive plans that charge you for what you don’t need – create your perfect plan with Tello Mobile today and start saving.

Create your perfect plan and start saving with Tello today.

DARIO’S PICKS

2. Advanced Voice Mode rolls out to ChatGPT Team and Enterprise

If you haven’t tried Advanced Voice Mode yet, you’ll likely get access soon—it’s rolling out to Team and Enterprise users this week (previously only available to Plus users). Even free users are getting a sneak peek (probably with lower limits).

The exception, of course, is everyone in the EU, UK, Switzerland, Iceland, Norway, and Liechtenstein. The likely reason for this is probably part a law that prohibits “the use of AI systems to infer emotions of a natural person”.

Starting this week, Advanced Voice is rolling out to all ChatGPT Enterprise, Edu, and Team users globally. Free users will also get a sneak peek of Advanced Voice.
Plus and Free users in the EU…we’ll keep you updated, we promise.
— OpenAI (@OpenAI)
6:14 PM • Oct 1, 2024

‎ Why it matters‎ ‎ Real-time voice chat is about to be everywhere—starting with all ChatGPT users (except the EU for now…).

DARIO’S PICKS

3. 3 real-world company AI use cases from logistics

I’m continuing to dig into Google’s list of 185 real-world gen AI use cases by different industries.Today, I’m showing you some great use cases companies have found for Gemini within the logistics sector.

PODS used Gemini to create “the world’s smartest billboard” for its trucks, that adapts to each Neighbourhood in New York city and changes in real-time. It generated 6,000 unique headlines across 299 neighbourhoods.
UPS Capital uses AI together with UPS data to provide a confidence score for shippers to determine the probability of a successful delivery.
Gojek (a “super app” similar to Uber but for Indonesia) uses AI to allow customers to use voice commands to complete things like bill payments and money transfers.

‎ Why it matters‎ ‎ Logistics is by nature a complex field, so maybe it’s no wonder these examples are a bit more technical than what I featured yesterday for retail – but hopefully still inspiring!

PS You can get the table of organized use cases that I made here for free.

RECOMMENDED

Love Hacker News but don’t have the time to read it every day?

Try TLDR’s free daily newsletter

THAT’S ALL FOLKS!

Was this email forwarded to you? Sign up here.

Want to get in front of 13,000 AI enthusiasts? Work with me.

This newsletter is written & curated by Dario Chincha.

What's your verdict on today's email?

Affiliate disclosure: To cover the cost of my email software and the time I spend writing this newsletter, I sometimes link to products and other newsletters. Please assume these are affiliate links. If you choose to subscribe to a newsletter or buy a product through any of my links then THANK YOU – it will make it possible for me to continue to do this.