šŸ§™šŸ¼ OpenAI's DevDay launches

Also: AI in logistics, 3 real-world use cases

Howdy, wizards.

OpenAIā€™s DevDay was a lot less surrounded by hype compared to last year, with Sam Altman being MIA from the event. But, the launches are still massive in terms of impact.

Letā€™s get into the nitty-gritty of what they announced.

DARIOā€™S PICKS

OpenAI held their much-awaited DevDay event yesterday, and had some exciting news for everyone building with AI:

  • Real-time API: OpenAI is enabling speech-to-speech through their API, the same tech that powers ChatGPTā€™s Advanced Voice Mode. That means developers can now start creating similar, more natural voice interactions inside their apps. The API will also support function-calling, which means these apps will have the ability to take actions in the real-world as well.

    • A demo of how this could work was shown with Healthify and their AI coach Ria (screenshot below). The user asked for lunch tips and it pulled up a separate screen with food recommendations during the conversation.

Source: OpenAI

  • Vision in the fine-tuning API: Itā€™s now possible to fine-tune with images, in addition to text. The enhanced image understanding will allow developers to build better versions of things like visual search, object detection, and image analysis. A couple of examples of companies who have been testing the feature:

    • Grab (like Uber, but for Southeast Asia) uses street-level images from their drivers to power GrabMaps. They used fine-tuned GPT4-o with only 100 examples to refine the mapping data, which made their model more capable in counting lanes and identifying speed limit signs.

    • Automat, a company doing RPA (Robotic Process Automation) builds agents that take actions on your computer to automate processes. With vision fine-tuning and a dataset of screenshots, they made GPT-4o much better at identifying UI elements on the screen (buttons, sliders, text fields, etc) based on natural language descriptions.

  • Prompt caching in the API: OpenAI is following the footsteps of Google and Anthropic, who released prompt caching in their APIs earlier this year. Prompt caching means they can now save frequently used context (e.g. uploaded documents, a codebase, any knowledge base) and access it quicker and cheaper.

  • Model distillation in the API: Itā€™s now easy to use outputs of frontier models like GPT4-o and o1 to fine-tune smaller, cost-efficient models like GPT-4o mini. This allows developers to cut cost while maintaining high performance.

ā€Ž Why it mattersā€Ž ā€Ž The wold can now build apps with real-time two way voices. And thereā€™s possible cost savings for many apps, especially if you donā€™t need frontier model capability. The vision fine-tuning is also powerful and is going to open new use cases; for example, it would make it easier to create apps that turns designs into coded websites/apps.

TOGETHER WITH TELLO

With Tello Mobile, you can say goodbye to overpriced contracts and hello to freedom. Their flexible, affordable options start as low as $5 and go up to $25/month for Unlimited Everything, allowing you to customize each plan to suit your family's exact requirements.

Whether you're looking for reliable 4G LTE/5G coverage, Wi-Fi calling, free international calls to 60+ countries, or unlimited texts, Tello has you covered. And with no contracts or hidden fees, you'll enjoy peace of mind knowing that you're getting exactly what you pay for.

Bring your own phone or explore our selection of devices to find the perfect fit for you. Stop settling for expensive plans that charge you for what you donā€™t need ā€“ create your perfect plan with Tello Mobile today and start saving.

DARIOā€™S PICKS

If you havenā€™t tried Advanced Voice Mode yet, youā€™ll likely get access soonā€”itā€™s rolling out to Team and Enterprise users this week (previously only available to Plus users). Even free users are getting a sneak peek (probably with lower limits).

The exception, of course, is everyone in the EU, UK, Switzerland, Iceland, Norway, and Liechtenstein. The likely reason for this is probably part a law that prohibits ā€œthe use of AI systems to infer emotions of a natural personā€.

ā€Ž Why it mattersā€Ž ā€Ž Real-time voice chat is about to be everywhereā€”starting with all ChatGPT users (except the EU for nowā€¦).

DARIOā€™S PICKS

Iā€™m continuing to dig into Googleā€™s list of 185 real-world gen AI use cases by different industries.Today, Iā€™m showing you some great use cases companies have found for Gemini within the logistics sector.

  • PODS used Gemini to create ā€œthe worldā€™s smartest billboardā€ for its trucks, that adapts to each Neighbourhood in New York city and changes in real-time. It generated 6,000 unique headlines across 299 neighbourhoods.

  • UPS Capital uses AI together with UPS data to provide a confidence score for shippers to determine the probability of a successful delivery.

  • Gojek (a ā€œsuper appā€ similar to Uber but for Indonesia) uses AI to allow customers to use voice commands to complete things like bill payments and money transfers.

ā€Ž Why it mattersā€Ž ā€Ž Logistics is by nature a complex field, so maybe itā€™s no wonder these examples are a bit more technical than what I featured yesterday for retail ā€“ but hopefully still inspiring!

PS You can get the table of organized use cases that I made here for free.

RECOMMENDED

Love Hacker News but donā€™t have the time to read it every day?

THATā€™S ALL FOLKS!

Was this email forwarded to you? Sign up here.

Want to get in front of 13,000 AI enthusiasts? Work with me.

This newsletter is written & curated by Dario Chincha.

What's your verdict on today's email?

Login or Subscribe to participate in polls.

Affiliate disclosure: To cover the cost of my email software and the time I spend writing this newsletter, I sometimes link to products and other newsletters. Please assume these are affiliate links. If you choose to subscribe to a newsletter or buy a product through any of my links then THANK YOU ā€“ it will make it possible for me to continue to do this.