Happy Horse 1.0 is now on ModelsLab

Try Now
Skip to main content
Available now on ModelsLab · Language Model

Qwen: Qwen3.5 397B A17BFrontier MoE Vision LLM

Activate 397B Intelligence Efficiently

Sparse MoE

17B Active Params

397B total parameters activate 17B per pass via 512-expert MoE routing.

Native Multimodal

Vision-Language Fusion

Handles text, images, videos with early fusion training across 201 languages.

Ultra-Efficient

8.6x Faster Decoding

Gated Delta Networks deliver 8.6x-19x speed over Qwen3-Max at 262k context.

Examples

See what Qwen: Qwen3.5 397B A17B can create

Copy any prompt below and try it yourself in the playground.

Code Review

Review this Python function for efficiency and suggest optimizations: def fibonacci(n): if n <= 1: return n else: return fibonacci(n-1) + fibonacci(n-2)

Tech Summary

Summarize key specs of Qwen3.5-397B-A17B architecture, including parameter count, context length, and multimodal capabilities.

Agent Plan

Plan steps to deploy a vLLM server for Qwen: Qwen3.5 397B A17B model with tensor-parallel-size 8 and 262k max length.

Reasoning Chain

Solve: A train leaves at 60 mph, another at 70 mph from stations 200 miles apart. When do they meet? Use step-by-step reasoning.

For Developers

A few lines of code.
Inference. Three lines.

ModelsLab handles the infrastructure: fast inference, auto-scaling, and a developer-friendly API. No GPU management needed.

  • Serverless: scales to zero, scales to millions
  • Pay per token, no minimums
  • Python and JavaScript SDKs, plus REST API
import requests
response = requests.post(
"https://modelslab.com/api/v7/llm/chat/completions",
json={
"key": "YOUR_API_KEY",
"prompt": "",
"model_id": ""
}
)
print(response.json())

FAQ

Common questions about Qwen: Qwen3.5 397B A17B

Read the docs

Open-weight vision-language LLM with 397B total params, 17B active via sparse MoE. Supports 262k native context, extensible to 1M. Competes with GPT-5.2 on reasoning, coding, agents.

Call via standard LLM endpoints with multimodal inputs. Use vLLM: vllm serve Qwen/Qwen3.5-397B-A17B --tensor-parallel-size 8. Enables tool use and 1M context in hosted versions.

Hybrid Gated DeltaNet + MoE activates 17B params per pass. Achieves 8.6x faster decoding than Qwen3-Max. Lower compute than 1T peers while ranking #3 open-weights.

Yes, native image/video input via early fusion. First Qwen open model unifying text and vision. Outperforms prior Qwen3-VL on visual benchmarks.

Matches frontier like Claude 4.5, Gemini-3 Pro on intelligence index. Open-weights under Apache 2.0 with 201 languages. Smaller active params than Kimi K2.5 or GLM-5.

262k tokens native, up to 1M extensible via YaRN. Supports reasoning/non-reasoning modes in one model.

Ready to create?

Start generating with Qwen: Qwen3.5 397B A17B on ModelsLab.