---
title: Inception: Mercury 2 — Fast Reasoning LLM | ModelsLab
description: Generate reasoning outputs at 1000 tokens per second with Inception: Mercury 2. Try the diffusion LLM for agents, coding, and real-time apps via API.
url: https://modelslab.com/inception-mercury-2
canonical: https://modelslab.com/inception-mercury-2
type: website
component: Seo/ModelPage
generated_at: 2026-04-15T02:01:29.496649Z
---

Available now on ModelsLab · Language Model

Inception: Mercury 2
Reasoning at 1000 Tokens/Second
---

[Try Inception: Mercury 2](/models/open_router/inception-mercury-2) [API Documentation](https://docs.modelslab.com)

Build Faster with Diffusion
---

Diffusion Core

### Parallel Token Refinement

Generates multiple tokens simultaneously via denoising, hitting 1000 tokens/sec on standard GPUs.

Speed Benchmark

### 5x Faster Than Haiku

Outpaces Claude 4.5 Haiku and GPT 5.2 Mini in reasoning at lower inference cost.

Production Ready

### 128K Context Tools

Supports tunable reasoning, native tool use, JSON output, OpenAI API compatible.

Examples

See what Inception: Mercury 2 can create
---

Copy any prompt below and try it yourself in the [playground](/models/open_router/inception-mercury-2).

Code Agent Loop

“You are a coding agent. Analyze this Python function for bugs, suggest fixes, and output valid JSON with code changes: def fibonacci(n): if n <= 1: return n return fibonacci(n-1) + fibonacci(n-2)”

Real-Time Search

“Summarize latest benchmarks for diffusion LLMs. Use chain-of-thought reasoning. Format as bullet points with sources.”

JSON Schema Output

“Generate a REST API spec for user authentication. Output strictly as JSON matching this schema: {api\_name: string, endpoints: array of objects with method, path, description}”

Voice Assistant Response

“User asks: What's the weather in Tokyo? Respond conversationally, fetch mock data, keep under 50 words for low latency.”

For Developers

A few lines of code.
Inference. Three lines.
---

ModelsLab handles the infrastructure: fast inference, auto-scaling, and a developer-friendly API. No GPU management needed.

- **Serverless:** scales to zero, scales to millions
- **Pay per token,** no minimums
- **Python and JavaScript SDKs,** plus REST API

[API Documentation ](https://docs.modelslab.com)

PythonJavaScriptcURL

Copy

```
<code>import requests

response = requests.post(
    "https://modelslab.com/api/v7/llm/chat/completions",
    json={
  "key": "YOUR_API_KEY",
  "prompt": "",
  "model_id": ""
}
)
print(response.json())</code>
```

FAQ

Common questions about Inception: Mercury 2
---

[Read the docs ](https://docs.modelslab.com)

### What is Inception: Mercury 2?

### How does Inception Mercury 2 API work?

### Is Inception: Mercury 2 model faster than alternatives?

### What is Inception: Mercury 2 alternative to?

### Does Inception Mercury 2 LLM support prompting?

### Where to use Inception: Mercury 2 API?

Ready to create?
---

Start generating with Inception: Mercury 2 on ModelsLab.

[Try Inception: Mercury 2](/models/open_router/inception-mercury-2) [API Documentation](https://docs.modelslab.com)

---

*This markdown version is optimized for AI agents and LLMs.*

**Links:**
- [Website](https://modelslab.com)
- [API Documentation](https://docs.modelslab.com)
- [Blog](https://modelslab.com/blog)

---
*Generated by ModelsLab - 2026-04-15*