Nvidia Nemotron 3 Super 120B A12b Bf16
Agentic Reasoning Supercharged
Scale Intelligence Efficiently
Hybrid Architecture
Mamba-Transformer MoE
Activates 12B of 120B parameters for 2.2x throughput over GPT-OSS-120B on B200 GPUs.
Long Context
1M Token Window
Handles extended sequences for multi-step planning and cross-document reasoning.
Optimized Precision
NVFP4 to Bf16
Pretrained in NVFP4, post-trained in Bf16 for 4x inference speed on Blackwell.
Examples
See what Nvidia Nemotron 3 Super 120B A12b Bf16 can create
Copy any prompt below and try it yourself in the playground.
Code Generation
“Write a Python function to parse JSON logs, extract error rates, and generate a summary report with visualizations using matplotlib. Include error handling and support for large files.”
Task Planning
“Plan a multi-step cybersecurity triage workflow: analyze network logs, identify anomalies, prioritize threats, and recommend mitigation steps with tool calls.”
Math Reasoning
“Solve this AIME-level problem: Find the number of integer solutions to x^2 + y^2 + z^2 = 2025 where x, y, z are positive integers up to 50. Explain each step.”
Agent Workflow
“Design an autonomous agent script for software development: generate unit tests, run them via subprocess, fix failures iteratively, and output refactored code.”
For Developers
A few lines of code.
Reasoning. One API call.
ModelsLab handles the infrastructure: fast inference, auto-scaling, and a developer-friendly API. No GPU management needed.
- Serverless: scales to zero, scales to millions
- Pay per token, no minimums
- Python and JavaScript SDKs, plus REST API
import requestsresponse = requests.post("https://modelslab.com/api/v7/llm/chat/completions",json={"key": "YOUR_API_KEY","prompt": "","model_id": ""})print(response.json())
Ready to create?
Start generating with Nvidia Nemotron 3 Super 120B A12b Bf16 on ModelsLab.