Skip to main content
Available now on ModelsLab · AI Model

Deepseek V3.2 ExpSparse Attention Unlocked

Master Long Contexts

DeepSeek Sparse Attention

Efficient Long-Context Processing

DSA uses lightning indexer and token selection for 50% lower costs on 128K tokens.

Benchmark Parity

Matches V3.1-Terminus

Delivers identical performance across domains with reduced compute via sparse attention.

API Ready

Instant vLLM Deployment

Run Deepseek V3.2 Exp API on H100/H200/B200 hardware from day zero.

Examples

See what Deepseek V3.2 Exp can create

Copy any prompt below and try it yourself in the playground.

Code Review

Analyze this 50K token Python codebase for bugs, suggest optimizations, and explain refactoring steps with examples.

Document Summary

Summarize key insights from this 100K token technical report on AI architectures, highlighting innovations and benchmarks.

Agent Planning

Plan a multi-step research workflow using 80K token context: search web, synthesize data, generate report with citations.

Math Proof

Prove this theorem step-by-step using 128K context of related papers, verify reasoning, and check for errors.

For Developers

A few lines of code.
Long context. One call.

ModelsLab handles the infrastructure: fast inference, auto-scaling, and a developer-friendly API. No GPU management needed.

  • Serverless: scales to zero, scales to millions
  • Pay per token, no minimums
  • Python and JavaScript SDKs, plus REST API
import requests
response = requests.post(
"https://modelslab.com/api/v7/llm/chat/completions",
json={
"key": "YOUR_API_KEY",
"prompt": "",
"model_id": ""
}
)
print(response.json())

FAQ

Common questions about Deepseek V3.2 Exp

Read the docs

Experimental LLM built on V3.1-Terminus introducing DeepSeek Sparse Attention for long-context efficiency. Matches prior model benchmarks. Supports 128K tokens.

Access via standard LLM endpoints with vLLM for NVIDIA hardware. Enables fine-grained sparse attention. Cuts inference costs up to 50%.

Two-stage mechanism with lightning indexer and token selector. Focuses on relevant tokens in long inputs. Maintains output quality.

685B total parameters, 37B active MoE. Trained on 14.8T tokens. FP8/BF16 precision support.

NVIDIA H100/H200/H20 and B200/GB200 via vLLM day zero. Kubernetes scaling with llm-d available.

V3.2-Exp is experimental precursor testing DSA. V3.2 adds RLVR and agent training. Both efficient for reasoning.

Ready to create?

Start generating with Deepseek V3.2 Exp on ModelsLab.