Google details Ironwood TPU for large-scale inference

Google unveiled details about its Ironwood Tensor Processing Unit (TPU) at Hot Chips 2025, following its initial announcement at Google Cloud Next ’25 in April. Ironwood represents Google’s seventh-generation TPU, specifically designed for large-scale inference workloads, marking a shift from previous generations focused on training.

Each Ironwood chip incorporates two compute dies, delivering 4,614 TFLOPs of FP8 performance. It features eight stacks of HBM3e, providing 192 GB of memory per chip with a 7.3 TB/s bandwidth. The system architecture scales up to 9,216 chips per pod, facilitated by 1.2 TB/s of I/O bandwidth, eliminating the need for glue logic and achieving a total of 42.5 exaflops of performance.

A key highlight of Ironwood is its memory capacity. A single pod provides 1.77 PB of directly addressable HBM, which Google claims is a new world record for shared memory supercomputers. This extensive memory capacity is made possible by optical circuit switches that link racks together.

The Ironwood TPU also emphasizes reliability and resilience. The hardware can automatically reconfigure around failed nodes and restore workloads from checkpoints. Features include an on-chip root of trust, built-in self-test functions, silent data corruption mitigation, and logic repair functions to improve manufacturing yield. According to Google, an emphasis on RAS (reliability, availability, and serviceability) is visible throughout the architecture.

Cooling is handled by a cold-plate solution integrated with Google’s third-generation liquid-cooling infrastructure. Google claims that Ironwood achieves a twofold improvement in performance per watt compared to its predecessor, Trillium. Dynamic voltage and frequency scaling further enhance efficiency during varied workloads.

AI techniques were also employed in the design of Ironwood to optimize ALU circuits and floor plans. A fourth-generation SparseCore has been added to accelerate embeddings and collective operations, supporting workloads such as recommendation engines.

Ironwood deployment is currently underway at hyperscale within Google Cloud data centers. However, the TPU remains an internal platform and is not directly available to Google Cloud customers.

Ryan Smith of ServeTheHome commented on Google’s presentation at Hot Chips 2025, stating, “This was an awesome presentation. Google saw the need to create high‑end AI compute many generations ago. Now the company is innovating at every level from the chips, to the interconnects, and to the physical infrastructure. Even as the last Hot Chips 2025 presentation this had the audience glued to the stage at what Google was showing.”

Google details Ironwood TPU for large-scale inference

Kerem Gülen

Related Posts

Xiaomi to launch fully self-developed smartphone in 2026

New WhatsApp parental controls will block strangers

Galaxy Unpacked 2026: S26 Ultra arrives just before MWC

Meta purges 550,000 Australian accounts to comply with under-16 ban

LATEST

Xiaomi to launch fully self-developed smartphone in 2026

New WhatsApp parental controls will block strangers

Galaxy Unpacked 2026: S26 Ultra arrives just before MWC

Meta purges 550,000 Australian accounts to comply with under-16 ban

Simple ways to install and remove programs on Ubuntu

A guide to preventing accidental typing on Windows and Mac

Accessing your Google Chrome bookmarks

A guide to installing restricted extensions in Google Chrome

Anthropic launches health features for Claude

Google removes AI Overviews from medical queries

© 2021 TechBriefly is a Linkmedya brand.

Google details Ironwood TPU for large-scale inference

Related Posts

LATEST

© 2021 TechBriefly is a Linkmedya brand.

Follow Us