Inception: Mercury
Reasoning at 1000 Tokens/Sec
Build Faster AI Apps
Diffusion Core
Parallel Token Generation
Refines token groups simultaneously for 5-10x speed over autoregressive LLMs.
Tunable Reasoning
Low to High Effort
Set reasoning levels from instant to high for optimized latency in voice agents.
128K Context
Native Tool Use
Supports schema-aligned JSON and tool integration as drop-in LLM replacement.
Examples
See what Inception: Mercury can create
Copy any prompt below and try it yourself in the playground.
Code Review
“Review this Python function for bugs and optimize for speed: def fibonacci(n): if n <= 1: return n return fibonacci(n-1) + fibonacci(n-2)”
JSON Schema
“Generate a schema-aligned JSON response listing top 5 Python libraries for data analysis with descriptions.”
Agent Workflow
“Plan a retrieval-augmented generation workflow using vector search and tool calls for querying customer data.”
Reasoning Chain
“High reasoning: Solve this logic puzzle step-by-step: Three houses in a row, owners A B C drink water milk tea, own cat dog bird.”
For Developers
A few lines of code.
Inference. Three lines.
ModelsLab handles the infrastructure: fast inference, auto-scaling, and a developer-friendly API. No GPU management needed.
- Serverless: scales to zero, scales to millions
- Pay per token, no minimums
- Python and JavaScript SDKs, plus REST API
import requestsresponse = requests.post("https://modelslab.com/api/v7/llm/chat/completions",json={"key": "YOUR_API_KEY","prompt": "","model_id": ""})print(response.json())
Ready to create?
Start generating with Inception: Mercury on ModelsLab.