--- title: Inception: Mercury 2 — Fast Reasoning LLM | ModelsLab description: Generate reasoning outputs at 1000 tokens per second with Inception: Mercury 2. Try the diffusion LLM for agents, coding, and real-time apps via API. url: https://modelslab.com/inception-mercury-2 canonical: https://modelslab.com/inception-mercury-2 type: website component: Seo/ModelPage generated_at: 2026-04-15T02:01:29.496649Z --- Available now on ModelsLab · Language Model Inception: Mercury 2 Reasoning at 1000 Tokens/Second --- [Try Inception: Mercury 2](/models/open_router/inception-mercury-2) [API Documentation](https://docs.modelslab.com) Build Faster with Diffusion --- Diffusion Core ### Parallel Token Refinement Generates multiple tokens simultaneously via denoising, hitting 1000 tokens/sec on standard GPUs. Speed Benchmark ### 5x Faster Than Haiku Outpaces Claude 4.5 Haiku and GPT 5.2 Mini in reasoning at lower inference cost. Production Ready ### 128K Context Tools Supports tunable reasoning, native tool use, JSON output, OpenAI API compatible. Examples See what Inception: Mercury 2 can create --- Copy any prompt below and try it yourself in the [playground](/models/open_router/inception-mercury-2). Code Agent Loop “You are a coding agent. Analyze this Python function for bugs, suggest fixes, and output valid JSON with code changes: def fibonacci(n): if n <= 1: return n return fibonacci(n-1) + fibonacci(n-2)” Real-Time Search “Summarize latest benchmarks for diffusion LLMs. Use chain-of-thought reasoning. Format as bullet points with sources.” JSON Schema Output “Generate a REST API spec for user authentication. Output strictly as JSON matching this schema: {api\_name: string, endpoints: array of objects with method, path, description}” Voice Assistant Response “User asks: What's the weather in Tokyo? Respond conversationally, fetch mock data, keep under 50 words for low latency.” For Developers A few lines of code. Inference. Three lines. --- ModelsLab handles the infrastructure: fast inference, auto-scaling, and a developer-friendly API. No GPU management needed. - **Serverless:** scales to zero, scales to millions - **Pay per token,** no minimums - **Python and JavaScript SDKs,** plus REST API [API Documentation ](https://docs.modelslab.com) PythonJavaScriptcURL Copy ```

import requests

response = requests.post(
    "https://modelslab.com/api/v7/llm/chat/completions",
    json={
  "key": "YOUR_API_KEY",
  "prompt": "",
  "model_id": ""
}
)
print(response.json())

``` FAQ Common questions about Inception: Mercury 2 --- [Read the docs ](https://docs.modelslab.com) ### What is Inception: Mercury 2? ### How does Inception Mercury 2 API work? ### Is Inception: Mercury 2 model faster than alternatives? ### What is Inception: Mercury 2 alternative to? ### Does Inception Mercury 2 LLM support prompting? ### Where to use Inception: Mercury 2 API? Ready to create? --- Start generating with Inception: Mercury 2 on ModelsLab. [Try Inception: Mercury 2](/models/open_router/inception-mercury-2) [API Documentation](https://docs.modelslab.com) --- *This markdown version is optimized for AI agents and LLMs.* **Links:** - [Website](https://modelslab.com) - [API Documentation](https://docs.modelslab.com) - [Blog](https://modelslab.com/blog) --- *Generated by ModelsLab - 2026-04-15*