Happy Horse 1.0 is now on ModelsLab

Try Now
Skip to main content
Available now on ModelsLab · Language Model

Qwen: Qwen3 VL 235B A22B InstructVision Meets Reasoning

Process Images, Generate Text

Multimodal Input

Images and Video

Handles text, images, video for VQA, OCR, document parsing with 262K context.

Agent Capabilities

GUI and Tool Use

Operates interfaces, aligns video timelines, supports multi-image dialogues.

Visual Coding

Sketches to Code

Converts mockups to Draw.io, HTML/CSS/JS; aids UI debugging workflows.

Examples

See what Qwen: Qwen3 VL 235B A22B Instruct can create

Copy any prompt below and try it yourself in the playground.

Chart Analysis

Analyze this sales chart image. Extract key trends, totals, and comparisons across quarters. Provide data in table format.

Document OCR

Extract all text from this multilingual invoice image. Identify fields like date, amount, vendor. Output as JSON.

Spatial Grounding

Describe object positions in this room photo. Ground locations in 2D coordinates. Note occlusions and viewpoints.

Video Timeline

From this video frame sequence, locate event at 1:23. Describe actions, align text to seconds.

For Developers

A few lines of code.
Multimodal inference. Few lines.

ModelsLab handles the infrastructure: fast inference, auto-scaling, and a developer-friendly API. No GPU management needed.

  • Serverless: scales to zero, scales to millions
  • Pay per token, no minimums
  • Python and JavaScript SDKs, plus REST API
import requests
response = requests.post(
"https://modelslab.com/api/v7/llm/chat/completions",
json={
"key": "YOUR_API_KEY",
"prompt": "",
"model_id": ""
}
)
print(response.json())

FAQ

Common questions about Qwen: Qwen3 VL 235B A22B Instruct

Read the docs

Open-weight multimodal LLM unifying text generation with image/video understanding. Supports VQA, OCR, spatial reasoning. Matches Qwen3-235B text performance.

Native 262K tokens, expandable to 1M. Processes long documents or hours-long videos with precise recall.

Outputs 55.5 tokens per second. Above average speed for similar open-weight models.

Visual agent for GUIs, coding from sketches, 2D/3D grounding, video timestamp alignment. Handles multilingual OCR and charts.

Available via APIs like this platform, OpenRouter, Vercel. Open weights on Hugging Face for self-hosting.

Text and images; supports multi-image, video frames. Outputs text for reasoning and generation.

Ready to create?

Start generating with Qwen: Qwen3 VL 235B A22B Instruct on ModelsLab.