Skip to main content
Available now on ModelsLab · Video Generation

Wan2.6 Text To VideoCinematic video. Fifteen seconds.

Generate. Sync. Storytell.

Multi-Shot Generation

Intelligent Scene Planning

Automatically splits complex prompts into distinct shots while maintaining character consistency and visual continuity.

Native Audio Sync

Phoneme-Level Lip Sync

Generates facial micro-expressions and lip movements aligned perfectly with input audio or text-to-speech scripts.

Extended Duration

Up to Fifteen Seconds

Create fuller narratives in single generation with expanded temporal and spatial capacity at 1080P resolution.

Examples

See what Wan2.6 Text To Video can create

Copy any prompt below and try it yourself in the playground.

Urban Timelapse

Cinematic timelapse of a modern city skyline at sunset. Shot 1: Wide establishing shot of downtown towers with golden hour light. Shot 2: Close-up tracking through busy street with neon signs reflecting on wet pavement. Shot 3: Aerial view of traffic flowing through intersection. Smooth camera movements, film grain, professional color grading.

Product Showcase

Luxury watch product reveal. Shot 1: Macro close-up of watch face with light reflecting off crystal. Shot 2: Slow 360-degree rotation on white studio background. Shot 3: Wrist shot with watch in motion against minimalist interior. Crisp detail, studio lighting, shallow depth of field.

Nature Documentary

Mountain landscape sequence. Shot 1: Wide aerial view of snow-capped peaks at dawn. Shot 2: Push-in through misty valley with pine forest. Shot 3: Close-up of flowing alpine stream with rocks. Cinematic color grading, natural lighting, smooth camera transitions.

Talking Head

Professional speaking to camera. Person in business attire sits at desk, looks directly at camera and says: 'Welcome to our presentation.' Soft studio lighting, neutral background, natural facial expressions, clear audio sync.

For Developers

A few lines of code.
Cinematic video. Three lines.

ModelsLab handles the infrastructure: fast inference, auto-scaling, and a developer-friendly API. No GPU management needed.

  • Serverless: scales to zero, scales to millions
  • Pay per second, no minimums
  • Python and JavaScript SDKs, plus REST API
import requests
response = requests.post(
"https://modelslab.com/api/v7/video-fusion/text-to-video",
json={
"key": "YOUR_API_KEY",
"prompt": "I man talking towards camera from great wall of china and saying, Welcome to my vlogs the beautiful views from this place is breathetaking and amazing you should also come here",
"init_audio": "https://assets.modelslab.ai/generations/74c4f2e6-2fa6-4d8f-a0e3-09ff1a94d9e1.mp3"
}
)
print(response.json())

FAQ

Common questions about Wan2.6 Text To Video

Read the docs

Wan 2.6 is Alibaba's multimodal video generation model that transforms text prompts into narrative-ready 15-second videos at 1080P resolution. It uses intelligent shot scheduling to automatically organize multi-shot sequences while maintaining character consistency and visual continuity throughout the generation.

Yes. Wan 2.6 offers native phoneme-level lip synchronization, generating facial micro-expressions and lip movements that align perfectly with input audio or text-to-speech scripts. Audio and video are generated together, eliminating the need for external dubbing software.

Yes. Wan 2.6 supports image-to-video generation with strong identity retention, allowing you to animate static character or product photos. You can also use reference videos to guide the look and maintain consistency across multiple generations.

Wan 2.6 generates up to 15-second videos at 1080P resolution, enabling fuller storytelling in a single generation. This extended duration supports multi-shot narratives with distinct scenes and camera transitions.

Wan 2.6 introduces native audio-visual synchronization, multi-shot storytelling with scene continuity, and extended 15-second generation length compared to Wan 2.5. It also features improved prompt understanding and better handling of complex instructions.

Yes. Wan 2.6 responds well to specific camera directions, style instructions, and scene composition guidance. You can describe techniques like 'tracking shot through fog' or 'slow push-in on subject' and the model will interpret and execute them accurately.

Ready to create?

Start generating with Wan2.6 Text To Video on ModelsLab.