StepFun: Step 3.5 Flash
Flash Reasoning 196B MoE
Reason Deep. Run Fast.
MoE Efficiency
11B Active Params
Activates 11B of 196B params per token via sparse MoE for top reasoning at 11B speed.
Blazing Speed
100-300 Tok/s
3-way Multi-Token Prediction delivers 100-300 tok/s, peaking at 350 tok/s for coding.
Long Context
256K Window
Hybrid Sliding Window Attention handles 256K context with low compute overhead.
Examples
See what StepFun: Step 3.5 Flash can create
Copy any prompt below and try it yourself in the playground.
Math Proof
“Solve this AIME-level math problem step-by-step: Prove that for integers n > 1, the sum of divisors function σ(n) satisfies certain bounds. Use chain-of-thought reasoning and verify with code execution if needed.”
Code Agent
“Write a Python function to parse a large codebase, identify bugs in async handlers, and suggest fixes. Output the refactored code with explanations.”
Logic Chain
“Analyze this complex logic puzzle involving 10 agents with constraints. Deduce the solution through multi-step reasoning, listing assumptions and eliminations.”
Data Summary
“Summarize key insights from a 200K token dataset on AI benchmarks, highlighting trends in MoE vs dense models, with quantitative comparisons.”
For Developers
A few lines of code.
Agentic inference. Few lines.
ModelsLab handles the infrastructure: fast inference, auto-scaling, and a developer-friendly API. No GPU management needed.
- Serverless: scales to zero, scales to millions
- Pay per token, no minimums
- Python and JavaScript SDKs, plus REST API
import requestsresponse = requests.post("https://modelslab.com/api/v7/llm/chat/completions",json={"key": "YOUR_API_KEY","prompt": "","model_id": ""})print(response.json())
Ready to create?
Start generating with StepFun: Step 3.5 Flash on ModelsLab.