Nemotron 3 Super: A Leap in Agentic AI Efficiency

A software development agent can now load an entire codebase into memory at once, eliminating the need for document segmentation and enabling seamless end-to-end code generation. This capability is a significant step forward in reducing the computational overhead of complex agentic systems.

Technical Breakthroughs

The Nemotron 3 Super model leverages a hybrid mixture-of-experts (MoE) architecture, combining Mamba layers for memory and compute efficiency with transformer layers for advanced reasoning. This design allows the model to activate only 12 billion of its 120 billion parameters during inference, significantly reducing computational costs while maintaining high accuracy.

Additionally, the model employs a new latent MoE technique that improves accuracy by activating four expert specialists for the cost of one, further enhancing performance. Multi-token prediction, which predicts multiple future words simultaneously, results in up to 3x faster inference speeds.

Performance on NVIDIA Platforms

The model runs on the NVIDIA Blackwell platform with NVFP4 precision, cutting memory requirements and pushing inference speeds up to 4x faster than FP8 on the NVIDIA Hopper architecture. This performance boost is achieved without any loss in accuracy, making it a compelling option for enterprise-grade AI workloads.

Open Access and Customization

NVIDIA is releasing Nemotron 3 Super with open weights under a permissive license, allowing developers to deploy and customize the model on workstations, data centers, or in the cloud. The model was trained on synthetic data generated using frontier reasoning models, and NVIDIA is publishing the complete methodology, including over 10 trillion tokens of pre- and post-training datasets.

Industry Adoption

Companies like Perplexity, Amdocs, Palantir, and Siemens are integrating Nemotron 3 Super into their AI agents to achieve higher accuracy at lower costs. This model is designed to handle complex subtasks within multi-agent systems, making it a valuable tool for industries ranging from software development to life sciences.

Availability

The Nemotron 3 Super model can be accessed through various platforms, including build.nvidia.com, Perplexity, OpenRouter, and Hugging Face. It is also available as an NVIDIA NIM microservice, allowing deployment from on-premises systems to the cloud.