OpenAI released GPT-5.4 on March 5, 2026. The press release calls it "our most capable and efficient frontier model for professional work," and the benchmarks back that up — record scores on OSWorld-Verified, WebArena Verified, and 83% on the GDPval knowledge-work test. The API version has a 1-million-token context window, which is the largest OpenAI has shipped. There's also a new Tool Search system that looks up tool definitions on demand instead of stuffing all of them into the prompt, which cuts cost significantly in multi-tool setups.
But here's the thing most developer posts are missing: GPT-5.4 doesn't generate images. Or video. Or audio. It never has, and it still doesn't. If your application needs any of that, you're combining GPT-5.4 with something else — and ModelsLab's API is the option most developers haven't thought through yet.
This post is a direct comparison for developers building AI applications: what GPT-5.4 gives you, what it doesn't, and where ModelsLab fills the gap.
What GPT-5.4 actually does
The three things OpenAI is leaning on for this release:
- Computer use. GPT-5.4 is OpenAI's first general-use model with native computer-use capabilities. It can autonomously navigate applications, fill forms, and execute multi-step workflows without you writing tool definitions for each action.
- Long-horizon reasoning. The 1M token context window isn't just for reading large documents. It's built for tasks like "analyze this entire codebase and produce a refactoring plan" or "review this 400-page legal filing." The new Tool Search system also helps here — tools are loaded as needed rather than upfront, so complex agent setups don't blow the context budget before the task starts.
- Fewer hallucinations. OpenAI reports measurably lower error rates compared to GPT-5.2, with improvements concentrated in factual recall and professional-domain tasks. The 83% score on GDPval — a knowledge-work benchmark covering financial analysis and legal reasoning — is the most independently verifiable signal. That matters for professional output where hallucination cost is high.
API pricing: $2.50 per million input tokens for the standard model. The Pro version costs more. By OpenAI's standards this is actually reasonable — GPT-5.4 is faster and cheaper than its predecessor at similar capability levels.
What GPT-5.4 doesn't do
GPT-5.4 generates text. That's it. No images, no video, no audio — not even via the API. If you're building an application that needs to create visual content, you need a separate image generation API. If you need video synthesis or voice cloning, same story.
This isn't a knock on GPT-5.4. It's a scoping decision — OpenAI is going deep on reasoning and agentic work, not media generation. But developers who gloss over this end up discovering it mid-build when their agent can describe an image in 500 words but can't actually create one.
What ModelsLab gives you that GPT-5.4 doesn't
ModelsLab is a media generation API platform. Over 100 AI models accessible through a single API key. The breakdown:
- Image generation: FLUX, Stable Diffusion XL, SD 1.5, Juggernaut XL, and 80+ other models. Text-to-image, image-to-image, inpainting, outpainting, ControlNet.
- Video generation: Wan 2.1, Kling, AnimateDiff, SVD. Generate video from text or from images.
- Audio and voice: Text-to-speech, voice cloning, music generation. Real-time TTS with configurable voice models.
- Image editing: Face swap, background removal, super-resolution upscaling, style transfer.
All of this is REST API. You pass a prompt, get back a URL or base64-encoded output. The API uses POST body authentication — you include your API key directly in the request body, not in the Authorization header.
Combining them: a GPT-5.4 agent that generates images
The interesting use case right now isn't picking one or the other — it's using GPT-5.4's reasoning to drive ModelsLab's media generation. GPT-5.4 decides what to generate; ModelsLab generates it.
Here's a minimal Python example. A GPT-5.4 agent that takes a user request, writes an image prompt, and calls the ModelsLab API to generate the image:
import openai
import requests
import json
OPENAI_KEY = "your-openai-key"
ML_KEY = "your-modelslab-key"
client = openai.OpenAI(api_key=OPENAI_KEY)
def generate_image_prompt(user_request: str) -> str:
"""Use GPT-5.4 to turn a vague request into a detailed image prompt."""
response = client.chat.completions.create(
model="gpt-5.4",
messages=[
{
"role": "system",
"content": "You are a prompt engineer for image generation models. "
"Convert the user's request into a detailed, specific image prompt "
"optimized for FLUX or Stable Diffusion. Return only the prompt."
},
{"role": "user", "content": user_request}
]
)
return response.choices[0].message.content
def generate_image(prompt: str) -> str:
"""Call ModelsLab API to generate the image."""
response = requests.post(
"https://modelslab.com/api/v6/realtime/text2img",
json={
"key": ML_KEY,
"prompt": prompt,
"negative_prompt": "blurry, low quality, distorted",
"width": "1024",
"height": "1024",
"samples": "1",
}
)
data = response.json()
if data.get("status") == "success":
return data["output"][0]
elif data.get("status") == "processing":
# Async generation — poll the fetch URL
fetch_id = data["id"]
fetch_url = data["fetch_result"]
return poll_result(fetch_url, ML_KEY)
else:
raise ValueError(f"Generation failed: {data}")
def poll_result(fetch_url: str, api_key: str, max_retries: int = 10) -> str:
"""Poll for async generation results."""
import time
for _ in range(max_retries):
time.sleep(3)
r = requests.post(fetch_url, json={"key": api_key})
data = r.json()
if data.get("status") == "success":
return data["output"][0]
raise TimeoutError("Image generation timed out")
# Usage
user_request = "A futuristic developer workspace with multiple monitors, dark theme"
prompt = generate_image_prompt(user_request)
print(f"Generated prompt: {prompt}")
image_url = generate_image(prompt)
print(f"Image URL: {image_url}")
GPT-5.4's Tool Search feature makes this even cleaner. You can define the ModelsLab API as a tool and let the model call it directly without a separate orchestration layer:
tools = [
{
"type": "function",
"function": {
"name": "generate_image",
"description": "Generate an image from a text prompt using ModelsLab API",
"parameters": {
"type": "object",
"properties": {
"prompt": {
"type": "string",
"description": "Detailed image generation prompt"
},
"width": {
"type": "string",
"enum": ["512", "768", "1024"],
"description": "Image width in pixels"
},
"height": {
"type": "string",
"enum": ["512", "768", "1024"],
"description": "Image height in pixels"
}
},
"required": ["prompt"]
}
}
}
]
response = client.chat.completions.create(
model="gpt-5.4",
messages=[{"role": "user", "content": "Create a hero image for my API documentation"}],
tools=tools,
tool_choice="auto"
)
With Tool Search, GPT-5.4 only loads the tool definition when it needs it. In a large agent with many tools defined, this keeps the request cost down.
Pricing comparison
GPT-5.4 API pricing: $2.50 per million input tokens, $15 per million output tokens. A 1,000-word document is roughly 1,300 input tokens ($0.003) plus a typical 650-token analysis response ($0.01 in output tokens) — full request cost: about $0.013. Output tokens are where the cost actually sits. For text-heavy agentic workloads, plan on the $0.01–0.02 per request range, not fractions of a cent.
ModelsLab image generation: pricing depends on the model and resolution. See modelslab.com/pricing for current per-generation rates across image, video, and audio endpoints.
The important thing: these are separate costs for separate capabilities. You're not choosing between GPT-5.4 and ModelsLab — you're stacking them for different parts of the pipeline.
When to use each
Use GPT-5.4 when you need: multi-step reasoning over large inputs, code generation and review, document analysis, agentic workflows that navigate software, professional output with reduced hallucination risk.
Use ModelsLab API when you need: image generation in any style or model, video synthesis from text or images, voice cloning or TTS, image editing operations, or media output at scale.
Use both when: your agent needs to reason about what to create before creating it. A content generation agent that writes copy and generates the accompanying images. A product demo tool that describes a feature and illustrates it. A customer support bot that explains how to use a product and generates custom screenshots.
Getting started
ModelsLab API documentation: docs.modelslab.com. The quickstart covers authentication (POST body key, not header), the most common endpoints, and how to handle async generation — which you'll hit on longer generations.
If you're building with GPT-5.4 and need media generation in the same pipeline, the REST API is the fastest path. No SDKs required, though community-maintained Python wrappers exist if you prefer that.
The ModelsLab API takes API keys issued per account, with usage metered by generation. Pay-as-you-go — no subscription required. Get your API key at modelslab.com.