Updated April 13: article originally posted April 10.
It may be the latest buzzword on Android, but Apple’s iPhone is not seen as an AI-infused smartphone. That’s set to change, and we now know one way in which Tim Cook and his team plan to catch up.
The details come in a newly released research paper by researchers from Cornell University working with Apple. Titled “Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs,” it details a multimodal large language model that can be used to understand what is displayed on a screen, specifically the elements in a mobile user interface, such as the display of an iPhone.
Thanks to a large supply of training data, it is possible to pick out icons, find text, parse widgets, describe in text what is on screen, parse the interface elements, and interact with the display while being guided by open-ended instructions and prompts.
Ferret was released in October 2023 and designed to parse photos and images to recognise what was on show. This upgrade to the snappily titled Ferret-UI will offer several benefits to those using it with their iPhone and could easily fit into an improved AI-powered Siri.
Being able to describe the screen, no matter the app, opens up a richer avenue for accessibility apps, removing the need to pre-program responses and actions. Those looking to perform complex tasks or find obscure options on their phone could ask Siri to open up a complex app and use an obscure function hidden away in the depths of the menu system.
Developers could use Ferret-UI as a testing tool, asking the MLMM to act as if it was a 14-year-old with little experience with social networks to perform tasks or simulate a 75-year-old user trying to connect to Facetime with their grandchildren.
Update: Saturday April 13: Along with the academic paper pointing towards an AI-upgrade for Siri, backend code discovered by Nicolás Álvarez points to new server-side tools for individual iPhones.
The features are labelled “Safari Browsing Assistant” and “Encrypted Visual Search.” Both of these tie into features described in the Ferret-UI research, although these discoveries do come with some caveats. This is server-side code, so it would be straightforward to change these features to use different code; they could tie to more prosaic code rather than utilising AI; or these could be placeholders that may or may not be used in future products.
It’s worth noting that Visual Search has been seen tucked away in the code for visionOS and the Apple Vision Pro headset, but that feature has yet to be launched.
While these are not strong signals of Apple’s path to AI on their own, they are part of a growing body of evidence on Apple’s approach.
Google publicly started the rush for AI-first smartphones on October 4th, a little more than three weeks after the launch of the iPhone 15. Tim Cook and his team did not make any noticeable announcements or draw attention to the AI improvements tucked away in its photo processing or text auto-correction, giving Android a head-start on AI and allowing Google’s mobile platform to set expectations.
Apple’s Worldwide Developer Conference takes place in June, and it will be the first moment Apple can engage with the public to discuss its AI plans as it lays the foundations for the launch of the iPhone 16 and iPhone 16 Pro in September.
Until then, we have the academic side of Apple’s AI approach to be going on with.
Now read why the iPhone’s approach to AI is disrupting the specs for the iPhone 16 and iPhone 16 Plus…