Inception: Mercury 2
Reasoning at 1000 Tokens/Second
Build Faster with Diffusion
Diffusion Core
Parallel Token Refinement
Generates multiple tokens simultaneously via denoising, hitting 1000 tokens/sec on standard GPUs.
Speed Benchmark
5x Faster Than Haiku
Outpaces Claude 4.5 Haiku and GPT 5.2 Mini in reasoning at lower inference cost.
Production Ready
128K Context Tools
Supports tunable reasoning, native tool use, JSON output, OpenAI API compatible.
Examples
See what Inception: Mercury 2 can create
Copy any prompt below and try it yourself in the playground.
Code Agent Loop
“You are a coding agent. Analyze this Python function for bugs, suggest fixes, and output valid JSON with code changes: def fibonacci(n): if n <= 1: return n return fibonacci(n-1) + fibonacci(n-2)”
Real-Time Search
“Summarize latest benchmarks for diffusion LLMs. Use chain-of-thought reasoning. Format as bullet points with sources.”
JSON Schema Output
“Generate a REST API spec for user authentication. Output strictly as JSON matching this schema: {api_name: string, endpoints: array of objects with method, path, description}”
Voice Assistant Response
“User asks: What's the weather in Tokyo? Respond conversationally, fetch mock data, keep under 50 words for low latency.”
For Developers
A few lines of code.
Inference. Three lines.
ModelsLab handles the infrastructure: fast inference, auto-scaling, and a developer-friendly API. No GPU management needed.
- Serverless: scales to zero, scales to millions
- Pay per token, no minimums
- Python and JavaScript SDKs, plus REST API
import requestsresponse = requests.post("https://modelslab.com/api/v7/llm/chat/completions",json={"key": "YOUR_API_KEY","prompt": "","model_id": ""})print(response.json())
Ready to create?
Start generating with Inception: Mercury 2 on ModelsLab.