- What's brewing in AI
- Posts
- š§š¼ OpenAI's DevDay launches
š§š¼ OpenAI's DevDay launches
Also: AI in logistics, 3 real-world use cases
Howdy, wizards.
OpenAIās DevDay was a lot less surrounded by hype compared to last year, with Sam Altman being MIA from the event. But, the launches are still massive in terms of impact.
Letās get into the nitty-gritty of what they announced.
DARIOāS PICKS
OpenAI held their much-awaited DevDay event yesterday, and had some exciting news for everyone building with AI:
Real-time API: OpenAI is enabling speech-to-speech through their API, the same tech that powers ChatGPTās Advanced Voice Mode. That means developers can now start creating similar, more natural voice interactions inside their apps. The API will also support function-calling, which means these apps will have the ability to take actions in the real-world as well.
A demo of how this could work was shown with Healthify and their AI coach Ria (screenshot below). The user asked for lunch tips and it pulled up a separate screen with food recommendations during the conversation.
Vision in the fine-tuning API: Itās now possible to fine-tune with images, in addition to text. The enhanced image understanding will allow developers to build better versions of things like visual search, object detection, and image analysis. A couple of examples of companies who have been testing the feature:
Grab (like Uber, but for Southeast Asia) uses street-level images from their drivers to power GrabMaps. They used fine-tuned GPT4-o with only 100 examples to refine the mapping data, which made their model more capable in counting lanes and identifying speed limit signs.
Automat, a company doing RPA (Robotic Process Automation) builds agents that take actions on your computer to automate processes. With vision fine-tuning and a dataset of screenshots, they made GPT-4o much better at identifying UI elements on the screen (buttons, sliders, text fields, etc) based on natural language descriptions.
Prompt caching in the API: OpenAI is following the footsteps of Google and Anthropic, who released prompt caching in their APIs earlier this year. Prompt caching means they can now save frequently used context (e.g. uploaded documents, a codebase, any knowledge base) and access it quicker and cheaper.
Model distillation in the API: Itās now easy to use outputs of frontier models like GPT4-o and o1 to fine-tune smaller, cost-efficient models like GPT-4o mini. This allows developers to cut cost while maintaining high performance.
ā Why it mattersā ā The wold can now build apps with real-time two way voices. And thereās possible cost savings for many apps, especially if you donāt need frontier model capability. The vision fine-tuning is also powerful and is going to open new use cases; for example, it would make it easier to create apps that turns designs into coded websites/apps.
TOGETHER WITH TELLO
With Tello Mobile, you can say goodbye to overpriced contracts and hello to freedom. Their flexible, affordable options start as low as $5 and go up to $25/month for Unlimited Everything, allowing you to customize each plan to suit your family's exact requirements. Whether you're looking for reliable 4G LTE/5G coverage, Wi-Fi calling, free international calls to 60+ countries, or unlimited texts, Tello has you covered. And with no contracts or hidden fees, you'll enjoy peace of mind knowing that you're getting exactly what you pay for. | Bring your own phone or explore our selection of devices to find the perfect fit for you. Stop settling for expensive plans that charge you for what you donāt need ā create your perfect plan with Tello Mobile today and start saving. |
DARIOāS PICKS
If you havenāt tried Advanced Voice Mode yet, youāll likely get access soonāitās rolling out to Team and Enterprise users this week (previously only available to Plus users). Even free users are getting a sneak peek (probably with lower limits).
The exception, of course, is everyone in the EU, UK, Switzerland, Iceland, Norway, and Liechtenstein. The likely reason for this is probably part a law that prohibits āthe use of AI systems to infer emotions of a natural personā.
Starting this week, Advanced Voice is rolling out to all ChatGPT Enterprise, Edu, and Team users globally. Free users will also get a sneak peek of Advanced Voice.
Plus and Free users in the EUā¦weāll keep you updated, we promise.
ā OpenAI (@OpenAI)
6:14 PM ā¢ Oct 1, 2024
ā Why it mattersā ā Real-time voice chat is about to be everywhereāstarting with all ChatGPT users (except the EU for nowā¦).
DARIOāS PICKS
Iām continuing to dig into Googleās list of 185 real-world gen AI use cases by different industries.Today, Iām showing you some great use cases companies have found for Gemini within the logistics sector.
PODS used Gemini to create āthe worldās smartest billboardā for its trucks, that adapts to each Neighbourhood in New York city and changes in real-time. It generated 6,000 unique headlines across 299 neighbourhoods.
UPS Capital uses AI together with UPS data to provide a confidence score for shippers to determine the probability of a successful delivery.
Gojek (a āsuper appā similar to Uber but for Indonesia) uses AI to allow customers to use voice commands to complete things like bill payments and money transfers.
ā Why it mattersā ā Logistics is by nature a complex field, so maybe itās no wonder these examples are a bit more technical than what I featured yesterday for retail ā but hopefully still inspiring!
PS You can get the table of organized use cases that I made here for free.
RECOMMENDED
Love Hacker News but donāt have the time to read it every day?
THATāS ALL FOLKS!
Was this email forwarded to you? Sign up here. Want to get in front of 13,000 AI enthusiasts? Work with me. This newsletter is written & curated by Dario Chincha. |
What's your verdict on today's email? |
Affiliate disclosure: To cover the cost of my email software and the time I spend writing this newsletter, I sometimes link to products and other newsletters. Please assume these are affiliate links. If you choose to subscribe to a newsletter or buy a product through any of my links then THANK YOU ā it will make it possible for me to continue to do this.