Available now on ModelsLab · Language Model

Inception: Mercury 2
Reasoning at 1000 Tokens/Second

Try Inception: Mercury 2 API Documentation

Build Faster with Diffusion

Diffusion Core

Parallel Token Refinement

Generates multiple tokens simultaneously via denoising, hitting 1000 tokens/sec on standard GPUs.

Speed Benchmark

5x Faster Than Haiku

Outpaces Claude 4.5 Haiku and GPT 5.2 Mini in reasoning at lower inference cost.

Production Ready

128K Context Tools

Supports tunable reasoning, native tool use, JSON output, OpenAI API compatible.

Examples

See what Inception: Mercury 2 can create

Copy any prompt below and try it yourself in the playground.

Code Agent Loop

“You are a coding agent. Analyze this Python function for bugs, suggest fixes, and output valid JSON with code changes: def fibonacci(n): if n <= 1: return n return fibonacci(n-1) + fibonacci(n-2)”

Real-Time Search

“Summarize latest benchmarks for diffusion LLMs. Use chain-of-thought reasoning. Format as bullet points with sources.”

JSON Schema Output

“Generate a REST API spec for user authentication. Output strictly as JSON matching this schema: {api_name: string, endpoints: array of objects with method, path, description}”

Voice Assistant Response

“User asks: What's the weather in Tokyo? Respond conversationally, fetch mock data, keep under 50 words for low latency.”

For Developers

A few lines of code.
Inference. Three lines.

ModelsLab handles the infrastructure: fast inference, auto-scaling, and a developer-friendly API. No GPU management needed.

Serverless: scales to zero, scales to millions
Pay per token, no minimums
Python and JavaScript SDKs, plus REST API

API Documentation

import requests

response = requests.post(
    "https://modelslab.com/api/v7/llm/chat/completions",
    json={
  "key": "YOUR_API_KEY",
  "prompt": "",
  "model_id": ""
}
)
print(response.json())

FAQ

Common questions about Inception: Mercury 2

Read the docs

Inception: Mercury 2 is the fastest reasoning LLM using diffusion to refine tokens in parallel. It achieves 1000 tokens/sec throughput. Matches performance of leading speed-optimized models.

Access via Inception API with OpenAI compatibility. Supports 128K context, tools, JSON mode. No sequential decoding; uses iterative denoising.

Yes, 5x faster than Claude 4.5 Haiku or GPT 5.2 Mini. Runs on standard GPUs without custom hardware.

Alternative to speed-optimized LLMs like Haiku. Built for low-latency coding, agents, voice apps. Lower cost per token.

Supports zero-shot, few-shot, chain-of-thought prompting. Tunable reasoning levels for flexible generation.

Ideal for agent loops, real-time search, code editing. Handles schema-aligned JSON and tool calls natively.

Ready to create?

Start generating with Inception: Mercury 2 on ModelsLab.

Try Inception: Mercury 2 API Documentation

Inception: Mercury 2Reasoning at 1000 Tokens/Second

Build Faster with Diffusion

Parallel Token Refinement

5x Faster Than Haiku

128K Context Tools

See what Inception: Mercury 2 can create

A few lines of code.Inference. Three lines.

Common questions about Inception: Mercury 2

What is Inception: Mercury 2?

How does Inception Mercury 2 API work?

Is Inception: Mercury 2 model faster than alternatives?

What is Inception: Mercury 2 alternative to?

Does Inception Mercury 2 LLM support prompting?

Where to use Inception: Mercury 2 API?

Ready to create?

Inception: Mercury 2
Reasoning at 1000 Tokens/Second

A few lines of code.
Inference. Three lines.