Available now on ModelsLab · AI Model

Deepseek V3.2 Exp
Sparse Attention Unlocked

Try Deepseek V3.2 Exp API Documentation

Master Long Contexts

DeepSeek Sparse Attention

Efficient Long-Context Processing

DSA uses lightning indexer and token selection for 50% lower costs on 128K tokens.

Benchmark Parity

Matches V3.1-Terminus

Delivers identical performance across domains with reduced compute via sparse attention.

API Ready

Instant vLLM Deployment

Run Deepseek V3.2 Exp API on H100/H200/B200 hardware from day zero.

Examples

See what Deepseek V3.2 Exp can create

Copy any prompt below and try it yourself in the playground.

Code Review

“Analyze this 50K token Python codebase for bugs, suggest optimizations, and explain refactoring steps with examples.”

Document Summary

“Summarize key insights from this 100K token technical report on AI architectures, highlighting innovations and benchmarks.”

Agent Planning

“Plan a multi-step research workflow using 80K token context: search web, synthesize data, generate report with citations.”

Math Proof

“Prove this theorem step-by-step using 128K context of related papers, verify reasoning, and check for errors.”

For Developers

A few lines of code.
Long context. One call.

ModelsLab handles the infrastructure: fast inference, auto-scaling, and a developer-friendly API. No GPU management needed.

Serverless: scales to zero, scales to millions
Pay per token, no minimums
Python and JavaScript SDKs, plus REST API

API Documentation

import requests

response = requests.post(
    "https://modelslab.com/api/v7/llm/chat/completions",
    json={
  "key": "YOUR_API_KEY",
  "prompt": "",
  "model_id": ""
}
)
print(response.json())

FAQ

Common questions about Deepseek V3.2 Exp

Read the docs

Experimental LLM built on V3.1-Terminus introducing DeepSeek Sparse Attention for long-context efficiency. Matches prior model benchmarks. Supports 128K tokens.

Access via standard LLM endpoints with vLLM for NVIDIA hardware. Enables fine-grained sparse attention. Cuts inference costs up to 50%.

Two-stage mechanism with lightning indexer and token selector. Focuses on relevant tokens in long inputs. Maintains output quality.

685B total parameters, 37B active MoE. Trained on 14.8T tokens. FP8/BF16 precision support.

NVIDIA H100/H200/H20 and B200/GB200 via vLLM day zero. Kubernetes scaling with llm-d available.

V3.2-Exp is experimental precursor testing DSA. V3.2 adds RLVR and agent training. Both efficient for reasoning.

Ready to create?

Start generating with Deepseek V3.2 Exp on ModelsLab.

Try Deepseek V3.2 Exp API Documentation

Deepseek V3.2 ExpSparse Attention Unlocked

Master Long Contexts

Efficient Long-Context Processing

Matches V3.1-Terminus

Instant vLLM Deployment

See what Deepseek V3.2 Exp can create

A few lines of code.Long context. One call.

Common questions about Deepseek V3.2 Exp

What is Deepseek V3.2 Exp?

How does Deepseek V3.2 Exp API work?

What is DeepSeek Sparse Attention?

Deepseek V3.2 Exp model specs?

Supported hardware for deepseek v3 2 exp api?

Differences from Deepseek V3.2?

Ready to create?

Deepseek V3.2 Exp
Sparse Attention Unlocked

A few lines of code.
Long context. One call.