Nvidia released benchmark data showing its GB300 NVL72 systems with Blackwell Ultra GPUs deliver up to 50x higher throughput per megawatt and 35x lower cost per token compared to the previous Hopper platform for low-latency AI workloads. The performance gains target the growing market for agentic AI applications and coding assistants.

Blackwell Ultra Tensor Cores provide 1.5x more compute performance than standard Blackwell GPUs. Attention-layer processing has doubled through accelerated softmax execution, addressing bottlenecks in transformer attention layers used by reasoning models with large context windows. Nvidia’s TensorRT-LLM inference library has also improved, with SemiAnalysis benchmarks showing throughput per GPU doubled at some interactivity levels since October 2025. Combining these hardware and software advances resulted in a 10x boost in tokens per second per user and a 5x improvement in tokens per second per megawatt versus Hopper, yielding the reported 50x increase in AI factory output.

“As inference moves to the center of AI production, long-context performance and token efficiency become critical,” said Chen Goldberg, senior vice president of engineering at CoreWeave. “Grace Blackwell NVL72 addresses that challenge directly.”

Major cloud providers are deploying GB300 NVL72 infrastructure. CoreWeave announced in 2025 that it was the first AI cloud provider to deploy the systems in production, integrating them with its Kubernetes-based cloud stack. Microsoft deployed what it called the world’s first large-scale GB300 NVL72 supercomputing cluster, achieving over 1.1 million tokens per second on a single rack in testing validated by Signal65. Oracle’s OCI platform is deploying GB300 NVL72 systems with plans to scale its Superclusters beyond 100,000 Blackwell GPUs to meet inference workload demand.

Cost reductions are reshaping AI deployment economics. Leading inference providers including Baseten, DeepInfra, Fireworks AI, and Together AI reported up to 10x cost reductions using the standard Blackwell platform. The Blackwell Ultra platform extends these gains for low-latency workloads, with the 35x lower cost per million tokens enabling more economically viable deployment of AI agents and coding assistants at scale.

Nvidia previewed its next-generation Rubin platform, claiming it will deliver another 10x performance improvement over Blackwell.


Featured image credit