Happy Horse 1.0 is now on ModelsLab

Try Now
Skip to main content
Available now on ModelsLab · Video Generation

Wan 2.5 Text-to-Video API — Cinematic Open-Source VideoAlibaba Wan 2.5 video model via REST API. Open weights, pay per second, no GPU.

Why developers use Wan 2.5

Wan 2.5

Alibaba's open-source video model

Wan 2.5 is the Alibaba DAMO Academy text-to-video release with strong cinematic quality and open weights. The same model as 'Wan2.5-T2V' on Hugging Face — exposed via a single REST endpoint.

Open-source weights

Auditable, no vendor lock-in

Wan 2.5 weights are publicly released. You can audit the model, fine-tune locally for R&D, then ship production via the API without rewriting your code.

Cinematic output

High-fidelity motion at 1080p

Native 1080p generation with strong physics realism, dynamic camera, and high prompt adherence on natural-language descriptions of cinematic shots.

Multiple aspect ratios

16:9, 9:16, and 1:1 supported

Generate landscape for YouTube and broadcast, portrait for short-form social, or square for feed ads — all from the same endpoint by setting width and height.

No cold starts

Dedicated GPU infrastructure

Wan 2.5 runs on a warm GPU pool. The first request is the same speed as the thousandth — no cold-start penalty in your latency SLO.

Webhook delivery

Async generation, no long-polling

Submit a generation request with a webhook URL; receive the final MP4 directly to your event handler. Perfect for serverless backends and event-driven architectures.

Pricing

Pay per second of output

No subscription required. Pay only for the seconds of video you generate. Volume discounts at 1000+ minutes per month.

Compatible with the broader API

One key, every modality

Wan 2.5 sits alongside Kling, Seedance, Veo, Runway, and other video models on the ModelsLab text-to-video endpoint. Switch model_id; everything else stays the same.

Examples

Wan 2.5 text-to-video examples

Copy any prompt below and try it yourself in the playground.

City Flyover

Aerial drone shot over futuristic city skyline at dusk, neon lights reflecting on glass towers, smooth camera pan right, ambient city hum and distant traffic sounds, 1080p, 10 seconds.

Ocean Waves

Calm ocean waves crashing on rocky shore at sunrise, golden light on water, slow zoom in on foam, gentle wave sounds and seagulls calling, 720p, 5 seconds.

Mountain Timelapse

Timelapse of clouds rolling over snow-capped mountains, shifting sunlight, steady upward pan, wind whispers and eagle cries, 1080p, 10 seconds.

Urban Street

Busy urban street with vintage cars driving past brick buildings, tracking shot forward, jazz music and footsteps echoing, 720p, 5 seconds.

For Developers

A few lines of code.
Cinematic video in one POST

ModelsLab handles the infrastructure: fast inference, auto-scaling, and a developer-friendly API. No GPU management needed.

  • Serverless: scales to zero, scales to millions
  • Pay per second, no minimums
  • Python and JavaScript SDKs, plus REST API
import requests
response = requests.post(
"https://modelslab.com/api/v7/video-fusion/text-to-video",
json={
"key": "YOUR_API_KEY",
"prompt": " Shot from a low angle, in a medium close-up, with warm tones, mixed lighting (the practical light from the desk lamp blends with the overcast light from the window), side lighting, and a central composition. In a classic detective office, wooden bookshelves are filled with old case files and ashtrays. A green desk lamp illuminates a case file spread out in the center of the desk. A fox, wearing a dark brown trench coat and a light gray fedora, sits in a leather chair, its fur crimson, its tail resting lightly on the edge, its fingers slowly turning yellowed pages. Outside, a steady drizzle falls beneath a blue sky, streaking the glass with meandering streaks. It slowly raises its head, its ears twitching slightly, its amber eyes gazing directly at the camera, its mouth clearly moving as it speaks in a smooth, cynical voice: 'The case was cold, colder than a fish in winter. But every chicken has its secrets, and I, for one, intended to find them '.",
"duration": "5",
"init_audio": "https://assets.modelslab.ai/generations/2f7dfdcb-2295-4c6f-966d-4e673baee8e3.mp3",
"resolution": "720"
}
)
print(response.json())

FAQ

Common questions about Wan 2.5 Text-to-Video API — Cinematic Open-Source Video

Read the docs

The Wan 2.5 API is a REST endpoint that runs Alibaba DAMO Academy's open-source Wan 2.5 video generation model on cloud GPUs. POST a text prompt; receive an MP4 URL with cinematic-quality video.

Wan 2.7 is the newer release with improved fidelity, better prompt adherence, and faster inference. Wan 2.5 is still widely used because the open-source weights are mature and the model is well-understood. ModelsLab exposes both — pick by setting model_id='wan-2.5' or model_id='wan-2.7'.

Yes. Wan 2.5 weights are publicly released by Alibaba DAMO Academy. You can audit, fine-tune, or self-host the model. The ModelsLab API gives you the same model with no GPU setup or model-weight management.

16:9 landscape (1920×1080), 9:16 portrait (1080×1920) for TikTok/Reels/Shorts, and 1:1 square (1080×1080). Pass width and height in the request to select.

Wan 2.5 generates clips from 3 to 10 seconds depending on resolution. Pass num_frames and fps to control duration. For longer cinematic pieces, chain clips with consistent character and scene references.

A 5-second 1080p clip typically generates in 60–120 seconds. The API runs on dedicated GPU infrastructure with no cold starts, so latency is consistent across requests.

Pricing starts at $0.04 per second of output for Wan 2.5 — slightly cheaper than the newer Wan 2.7. No subscription required; pay only for what you generate.

Yes. Wan 2.5 is one of several text-to-video models available on the ModelsLab text-to-video endpoint. Switch model_id to 'kling-3.0', 'seedance-2.0', 'veo-3.1', or 'wan-2.7' without changing anything else in your code.

Yes. Pass an init_image URL alongside your prompt and Wan 2.5 will animate the source image. The output preserves subject identity from the reference image.

Yes. Prompts and generated videos are processed in compliant regions, and outputs are auto-deleted from the CDN after 7 days by default. A signed DPA is available for enterprise customers.

Ready to create?

Start generating with Wan 2.5 Text-to-Video API — Cinematic Open-Source Video on ModelsLab.