Google’s TPU v8 splits training and inference tasks across two chips

Google is dividing its eighth-generation tensor processing unit into two separate chips—one for AI model training, the other for real-time inference. The move hands off specific workloads to Broadcom and MediaTek while maintaining Google’s core compute capabilities.

Google’s TPU v8 architecture is breaking away from its traditional unified design by splitting responsibilities between two distinct chips: one optimized for training large AI models and another engineered for low-latency inference. This shift marks a departure from previous generations, where a single TPU handled both tasks, instead distributing workloads to specialized partners while Google retains control over foundational compute operations.

The training chip will focus on high-performance matrix computations, critical for scaling AI models in data centers. It leverages Google’s tensor acceleration expertise but is tailored specifically for the demands of model development. Meanwhile, the inference chip targets real-time applications, such as on-device AI processing, where power efficiency and low latency are paramount.

Broadcom will integrate the training chip into its AI infrastructure, while MediaTek will embed the inference chip in future mobile and embedded systems. This partnership expands Google’s hardware footprint beyond data centers, embedding its technology in consumer devices. However, it also introduces complexity for operators, who must now manage integration between two separate chips rather than a single unified solution.

Google’s TPU v8 splits training and inference tasks across two chips

Both chips maintain key specifications from previous TPU generations, including 16GB of HBM2e memory per core and clock speeds up to 3.0 GHz. They also introduce optimizations for sparse tensor operations, which are increasingly important as AI models grow larger and more complex. While performance metrics align with earlier versions, pricing and availability remain undisclosed, leaving data-center operators in a wait-and-see position.

The shift toward specialized hardware reflects broader industry trends, where workloads are being divided across optimized chips for training, inference, and other stages of AI development. Whether Google’s approach will gain traction outside its ecosystem or remain proprietary is still unclear. For now, the focus remains on how these chips will integrate with existing infrastructure and whether the split design offers tangible advantages over unified architectures.

TECHOLAM

Google’s TPU v8 splits training and inference tasks across two chips

Key takeaways