Nvidia launched Nemotron 3 Super, a 120-billion-parameter open-weight model with a 1-million-token context window. The company positioned the model for running complex agentic AI systems at scale. It became available on build.nvidia.com, Perplexity, OpenRouter, and Hugging Face.
Enterprises can access the model through Google Cloud Vertex AI and Oracle Cloud Infrastructure, with upcoming support for Amazon Bedrock and Microsoft Azure. The model uses a hybrid latent mixture-of-experts and Mamaba-Transformer architecture. This allows the model to call 4x more expert specialists during inference at the same cost as previous models.
Nvidia trained the model on synthetic data from other frontier reasoning models. The company published over 10 trillion tokens of pre- and post-training datasets, along with 15 training environments for reinforcement learning and evaluation recipes. ServiceNow and other enterprises used previous Nemotron variants to fine-tune their own models.
Artificial Analysis benchmarks show the model scores 36 on overall intelligence. This result places it above gpt-oss-120B, which scores 33, but behind Gemini 3.1 Pro and GPT-5.4, which both score 57. The model achieves 478 output tokens per second, making it the fastest in its class.
Gpt-oss-120B is the second-fastest model at 264 output tokens per second. Nvidia stated Nemotron 3 Super offers 7.5x higher inference throughput than Qwen3.5-122B. The company did not announce a release date for Nemotron 3 Ultra, the family’s largest 500-billion-parameter model.
Nvidia debuted Nemotron 3 Nano, a 30-billion-parameter open-weight model, in December 2024. The company optimized that model for smaller targeted tasks. Nvidia teased Nemotron 3 Ultra in a previous announcement but has not provided a release timeline.








