Huge news hits Apple and Nvidia investors: Siri is quietly being rewired in the cloud rather than only on the iPhone. Under a new arrangement, Apple is expected to run demanding voice and generative workloads on Nvidia GPUs hosted inside Google Cloud infrastructure, turning a former sideshow into a three-way market signal.
This move looks less like experimentation and more like balance-sheet strategy. By leaning on Nvidia’s high-end accelerators for server-side inference while keeping on-device models on Apple Silicon, Apple can stretch its capital budget, avoid an all-or-nothing buildout of its own data centers, and still claim end-to-end control over user experience and latency.
For Nvidia holders, the message is blunt. Hyperscalers are not the only growth engine; platform companies like Apple are becoming recurring demand sources for data center GPUs, strengthening Nvidia’s pricing power and utilization for its CUDA ecosystem and networking stack. Every additional Siri request that offloads to the cloud becomes another silent revenue stream for Nvidia hardware and software.
Google’s role is not charity either. Renting capacity to Apple helps fill existing clusters, supports its capex-heavy tensor and GPU footprint, and reinforces Google Cloud as a neutral substrate even for direct rivals. If Apple later diversifies to its own custom accelerators, this current dependence still teaches it the operational physics of running large-scale AI inference at cloud scale.