Happy Horse 1.0 is now on ModelsLab

Try Now
Skip to main content
Available now on ModelsLab · Language Model

Inception: Mercury 2Reasoning at 1000 Tokens/Second

Build Faster with Diffusion

Diffusion Core

Parallel Token Refinement

Generates multiple tokens simultaneously via denoising, hitting 1000 tokens/sec on standard GPUs.

Speed Benchmark

5x Faster Than Haiku

Outpaces Claude 4.5 Haiku and GPT 5.2 Mini in reasoning at lower inference cost.

Production Ready

128K Context Tools

Supports tunable reasoning, native tool use, JSON output, OpenAI API compatible.

Examples

See what Inception: Mercury 2 can create

Copy any prompt below and try it yourself in the playground.

Code Agent Loop

You are a coding agent. Analyze this Python function for bugs, suggest fixes, and output valid JSON with code changes: def fibonacci(n): if n <= 1: return n return fibonacci(n-1) + fibonacci(n-2)

Real-Time Search

Summarize latest benchmarks for diffusion LLMs. Use chain-of-thought reasoning. Format as bullet points with sources.

JSON Schema Output

Generate a REST API spec for user authentication. Output strictly as JSON matching this schema: {api_name: string, endpoints: array of objects with method, path, description}

Voice Assistant Response

User asks: What's the weather in Tokyo? Respond conversationally, fetch mock data, keep under 50 words for low latency.

For Developers

A few lines of code.
Inference. Three lines.

ModelsLab handles the infrastructure: fast inference, auto-scaling, and a developer-friendly API. No GPU management needed.

  • Serverless: scales to zero, scales to millions
  • Pay per token, no minimums
  • Python and JavaScript SDKs, plus REST API
import requests
response = requests.post(
"https://modelslab.com/api/v7/llm/chat/completions",
json={
"key": "YOUR_API_KEY",
"prompt": "",
"model_id": ""
}
)
print(response.json())

FAQ

Common questions about Inception: Mercury 2

Read the docs

Inception: Mercury 2 is the fastest reasoning LLM using diffusion to refine tokens in parallel. It achieves 1000 tokens/sec throughput. Matches performance of leading speed-optimized models.

Access via Inception API with OpenAI compatibility. Supports 128K context, tools, JSON mode. No sequential decoding; uses iterative denoising.

Yes, 5x faster than Claude 4.5 Haiku or GPT 5.2 Mini. Runs on standard GPUs without custom hardware.

Alternative to speed-optimized LLMs like Haiku. Built for low-latency coding, agents, voice apps. Lower cost per token.

Supports zero-shot, few-shot, chain-of-thought prompting. Tunable reasoning levels for flexible generation.

Ideal for agent loops, real-time search, code editing. Handles schema-aligned JSON and tool calls natively.

Ready to create?

Start generating with Inception: Mercury 2 on ModelsLab.