Google Ironwood TPU: Advancing AI Inference Power

The Interconnect Advantage: How Ironwood’s ICI Boosts AI Performance

Despite the complexities of benchmarking, the underlying message is clear: Ironwood constitutes a major breakthrough for Google’s artificial intelligence platform. The advanced speed and efficiency of Ironwood extends a solid base that previously supported the quick development of advanced models such as Gemini 2.5 which runs on older generation TPUs.

Google expects that Ironwood’s advanced inference capabilities and improved efficiency will lead to groundbreaking artificial intelligence developments during the upcoming year. Ironwood will enable Google’s “age of inference” vision by supplying the computational power needed for advanced models and true agentic capabilities to transform AI into a proactive and intelligent digital entity that can think for us.

Decoding the Numbers: Ironwood’s Performance Context

Evaluating different AI chips becomes complicated because each chip uses different benchmarking methods. Ironwood’s primary benchmark, according to Google standards, utilizes FP8 precision. The company claims Ironwood “pods” boast 24 times the speed of the world’s leading supercomputers, but this figure should be examined carefully since many supercomputing systems lack native FP8 hardware support.

Google excluded TPU v6 (Trillium) from their direct performance comparisons. Google explains that Ironwood achieves double the performance efficiency for each watt when compared to the TPU v6. Google explained Ironwood will replace TPU v5p but Trillium succeeds the TPU v5e which had less power. The peak performance of Trillium reached about 918 TFLOPS when operating at FP8 precision.

Inside Ironwood: A Performance Powerhouse

Ironwood showcases a substantial improvement in its processing capabilities over earlier Google TPUs. The deployment strategy requires building large-scale liquid-cooled clusters housing up to 9,216 Ironwood chips each. The newly enhanced Inter-Chip Interconnect (ICI) enables seamless communication between vast computational resources while maintaining high-speed and efficient data flow throughout the system.

Google’s internal AI research and development projects will benefit from this immense processing power, along with developers who use Google Cloud. Ironwood will be offered in two configurations: Ironwood will deliver two configurations including a 256-chip server for standard AI tasks alongside a 9,216-chip cluster for handling very complex AI operations.

A fully configured Ironwood pod achieves an incredible inference computing capacity of 42.5 Exaflops. Google reports that each Ironwood chip achieves a peak performance of 4,614 TFLOPs which surpasses earlier TPU generations. The memory capacity of each Ironwood chip has expanded to 192GB which represents a sixfold increase from the Trillium TPU. The memory bandwidth has expanded by 4.5 times to achieve a new maximum rate of 7.2 Tbps.

Google has just unveiled its latest innovation in custom silicon: The newest version of Google’s Tensor Processing Unit architecture is the seventh generation, known as Ironwood. The new chip’s architecture focuses on meeting the complex requirements of Google’s advanced Gemini model,s which performs tasks similar to “thinking” as defined by Google through simulated reasoning.

The company always emphasizes the essential relationship between its advanced artificial intelligence models and their custom-designed infrastructure. Ironwood represents a fundamental element of this approach because it delivers substantial improvements in inference speed alongside enhanced capability for processing extensive contextual data through these advanced models. Google introduces Ironwood as its flagship TPU that delivers unmatched scalability and performance to enable AI that can independently collect data and produce results, forming the foundation of Google’s agentic AI vision during the “age of inference.”