Happy Horse 1.0 is now on ModelsLab

Try Now
Skip to main content
Available now on ModelsLab · Language Model

Llama 4 Maverick Instruct (17Bx128E)Multimodal MoE Power

Run Maverick Efficiently

MoE Architecture

17B Active 400B Total

Activates 17B parameters from 400B total across 128 experts for text and image tasks.

Native Multimodal

Text Image Fusion

Processes multilingual text and images with early fusion for reasoning and vision.

Single H100 Fit

FP8 Quantized Weights

FP8 weights load on one H100 GPU while preserving quality for fast inference.

Examples

See what Llama 4 Maverick Instruct (17Bx128E) can create

Copy any prompt below and try it yourself in the playground.

Chart Analysis

Analyze this sales chart image. Extract key trends, compare quarters, and suggest optimizations. Output in JSON with metrics.

Code Debug

Review this Python function for errors. The code processes image data from a multimodal dataset. Fix bugs and optimize for MoE efficiency.

Doc Reasoning

Read this technical document image on MoE architectures. Summarize Llama 4 Maverick specs, including parameter counts and context length.

Multilingual Query

Translate and reason about this French diagram on AI inference. Explain H100 deployment in English, list pros and cons.

For Developers

A few lines of code.
Instruct via API. One call.

ModelsLab handles the infrastructure: fast inference, auto-scaling, and a developer-friendly API. No GPU management needed.

  • Serverless: scales to zero, scales to millions
  • Pay per token, no minimums
  • Python and JavaScript SDKs, plus REST API
import requests
response = requests.post(
"https://modelslab.com/api/v7/llm/chat/completions",
json={
"key": "YOUR_API_KEY",
"prompt": "",
"model_id": ""
}
)
print(response.json())

FAQ

Common questions about Llama 4 Maverick Instruct (17Bx128E)

Read the docs

Llama 4 Maverick Instruct (17Bx128E) is a multimodal MoE LLM with 17B active parameters from 400B total across 128 experts. It handles text, images, code, and multilingual tasks up to 1M context. Trained on 22T tokens with August 2024 knowledge cutoff.

The llama 4 maverick instruct 17bx128e API accepts text and image inputs, outputs text and code. Supports function calling, structured output, batch predictions. Use FP8 for single H100 deployment.

Yes, it uses early fusion for native text-image understanding. Excels in image reasoning, captioning, ChartQA (90.0), DocVQA (94.4). Supports 12+ languages.

Offers Llama 3.3 70B quality at lower cost via MoE efficiency. Fits FP8 on one H100, 1M context length. Strong in coding, knowledge, MMLU Pro (59.6).

Supports up to 1M input tokens, 8K output in some configs. Optimized for long-context multimodal reasoning and instruction following.

Available via ModelsLab LLM endpoint. Deploy with BF16 or FP8 weights for H100 inference. Commercial use ready.

Ready to create?

Start generating with Llama 4 Maverick Instruct (17Bx128E) on ModelsLab.