Apple executives detailed the architecture of the company’s new Apple Foundation Models (AFM) and clarified exactly how Google’s technology factored into their development.
Craig Federighi, Apple’s senior vice president of software engineering, held a tech talk after the keynote (via 9to5Mac) with the press on Monday alongside VP of AI Amar Subramanya, Director of Siri Mike Rockwell and VP of Software Sébastien Marineau-Mes to explain how the third-generation AFM family was built and how it powers Apple Intelligence.
“The number of Google Assistants we use is zero,” Federighi said, explaining that Apple uses none of the Gemini models deployed to Google customers, no Google client-side code, and no Google search infrastructure as a knowledge base.
Of course, we don’t have the Gemini app as an app. In fact, none of these client codes are part of how we work on iOS. For these models, we do not use any of the models that Google deploys to its customers, nor do we use the infrastructure and means by which they deploy models to their customers. And then as far as the knowledge base goes, we of course don’t use Google search or anything like that as the basis of our system.
Subramanya introduced the new AFM family, which includes two on-device models and three server-side models. The on-device tier includes AFM Core, a next-generation dense architecture model, and AFM Core Advanced, which uses a sparse architecture and is natively multimodal.
Subramanya said AFM Core Advanced is “unlike any on-device model we’ve used before,” enabling new features including invitations and expressive voices without any cloud demands. On the server side, AFM Cloud handles latency-optimized Private Cloud Compute queries, while AFM Cloud Image powers image generation and editing features, including spatial cropping.
The key detail of the collaboration with Google came in Subramanya’s description of how these four models were trained. “All of these are custom built for Apple Silicon, trained using proprietary data with reinforcement learning, and refined using results from Gemini frontier models,” he said, clarifying that Google’s contribution was based on distillation and not mass adoption of Gemini.
The fifth and most capable model, AFM Cloud Pro, is designed for agent tool use and complex reasoning tasks, with a quality that Subramanya says is “similar to Gemini frontier models.” This model marks a departure from the standard configuration of Apple’s Private Cloud Compute.
To run it, Apple worked with Google and Nvidia to extend its private cloud infrastructure to Nvidia GPUs hosted in Google’s cloud. Marineau-Mes said Apple wanted to use Nvidia’s latest chips, but required them to be configured so that they could not read content from Apple’s servers. A recent technology from Nvidia called “ambiguous confidential computing” provided the solution.
We wanted to take advantage of Nvidia’s latest technologies and therefore decided to extend computing from the private cloud to the third-party cloud.
Federighi described the broader system architecture as being organized around a System Orchestrator, software he called “key to the privacy architecture of our entire system.” The orchestrator routes any given request to the appropriate model, on-device or in the cloud, based on the complexity of the request and the personal context required.
It leverages an App Toolbox for in-app actions, a Spotlight Semantic Index for personal content, and on-screen context for real-time awareness. For news queries, answers are found through Apple’s World Knowledge Service, which Federighi said the company has been developing over several years.
Apple also maintains that all Private Cloud Compute infrastructure, including extensive Nvidia GPU capacity in Google’s cloud, can be independently verified by third-party researchers to confirm that user data is never stored or accessed.
