Happy Horse 1.0 is now on ModelsLab

Try Now
Skip to main content
Available now on ModelsLab · Language Model

NVIDIA: Nemotron 3 Super (free)Agentic reasoning. Fully open.

Built for autonomous agents

Sparse MoE

120B parameters, 12B active

Frontier-class reasoning at fraction of compute cost with latent mixture-of-experts architecture.

Long context

1M token window

Agents retain full workflow state without truncation for multi-step reasoning and planning.

Native efficiency

4x faster inference

NVFP4 pretraining delivers 4x speedup on Blackwell GPUs versus FP8 on Hopper.

Examples

See what NVIDIA: Nemotron 3 Super (free) can create

Copy any prompt below and try it yourself in the playground.

IT ticket routing

Analyze this support ticket, classify severity and category, extract required information, and route to appropriate team with reasoning.

Multi-step research

Research the latest developments in renewable energy, synthesize findings across multiple documents, and generate a comprehensive analysis with citations.

Code generation

Generate Python function to process API responses, handle edge cases, include error handling, and add comprehensive docstrings.

Agent orchestration

Plan a multi-step workflow to migrate database schema, coordinate between teams, track dependencies, and generate status reports.

For Developers

A few lines of code.
Reasoning agents. Three lines.

ModelsLab handles the infrastructure: fast inference, auto-scaling, and a developer-friendly API. No GPU management needed.

  • Serverless: scales to zero, scales to millions
  • Pay per token, no minimums
  • Python and JavaScript SDKs, plus REST API
import requests
response = requests.post(
"https://modelslab.com/api/v7/llm/chat/completions",
json={
"key": "YOUR_API_KEY",
"prompt": "",
"model_id": ""
}
)
print(response.json())

FAQ

Common questions about NVIDIA: Nemotron 3 Super (free)

Read the docs

Nemotron 3 Super combines a hybrid Mamba-Transformer architecture with latent MoE, delivering 2.2x higher throughput than GPT-OSS-120B and 7.5x higher than Qwen 3.5 while matching accuracy. It's pre-trained in NVFP4 for 4x faster inference on Blackwell GPUs.

The latent mixture-of-experts architecture routes tokens through a compressed latent space, activating only 12B parameters at inference time. This sparse routing reduces compute cost while maintaining frontier-class reasoning quality.

Nemotron 3 Super is trained with multi-environment RL across 21+ configurations using 1.2 million environment rollouts. It scores 85.6% on PinchBench, making it the best open model for agentic reasoning with native support for step-by-step reasoning traces.

Multi-agent workflows generate up to 15x more tokens than standard chat due to resending full histories and tool outputs. The 1M context window lets agents retain complete workflow state without truncation, enabling coherent long-term reasoning.

Yes. Nemotron 3 Super is fully open with open weights, datasets, and recipes under the NVIDIA Open License, allowing easy customization and secure deployment anywhere from workstation to cloud.

Ready to create?

Start generating with NVIDIA: Nemotron 3 Super (free) on ModelsLab.