Qwen3-VL-235B-A22B-Instruct-FP8
Vision Meets Reasoning
Process Images. Reason Deeply.
Visual Agent
Navigate GUIs Autonomously
Recognizes GUI elements, understands functions, invokes tools for task completion.
Spatial Reasoning
Ground 2D and 3D
Judges object positions, viewpoints, occlusions with precise spatial perception.
Video Analysis
Handle Long Videos
Supports 262K context for hours-long videos with second-level indexing.
Examples
See what Qwen3-VL-235B-A22B-Instruct-FP8 can create
Copy any prompt below and try it yourself in the playground.
GUI Task
“Analyze this screenshot of a web app. Identify the login button, describe its position relative to the header, and suggest how to click it using coordinates.”
Spatial Query
“Examine this architectural blueprint image. Determine the relative positions of rooms, detect any occlusions, and provide 3D grounding estimates.”
Video Summary
“Process this 5-minute product demo video. Index key events by second, describe spatial changes in objects, and generate a timeline summary.”
Document OCR
“Extract all text from this scanned technical diagram. Align text with visual elements, reason about diagram logic, and output structured JSON.”
For Developers
A few lines of code.
Vision inference. One call.
ModelsLab handles the infrastructure: fast inference, auto-scaling, and a developer-friendly API. No GPU management needed.
- Serverless: scales to zero, scales to millions
- Pay per token, no minimums
- Python and JavaScript SDKs, plus REST API
import requestsresponse = requests.post("https://modelslab.com/api/v7/llm/chat/completions",json={"key": "YOUR_API_KEY","prompt": "","model_id": ""})print(response.json())
Ready to create?
Start generating with Qwen3-VL-235B-A22B-Instruct-FP8 on ModelsLab.