Skip to main content
Available now on ModelsLab · Video Generation

Wan2.6 Image To VideoImage to cinematic video

Generate. Sync. Scale. Instantly.

Native Audio Sync

Phoneme-Level Lip Sync

Generate facial micro-expressions and lip movements perfectly aligned with input audio or text-to-speech.

Ultra-High Resolution

Up to 4K Output

Render videos at 1080p, 2K, or native 4K without upscaling for broadcast-ready quality.

Extended Storytelling

15-Second Multi-Shot

Generate coherent narrative sequences with automatic shot transitions and consistent character identity.

Examples

See what Wan2.6 Image To Video can create

Copy any prompt below and try it yourself in the playground.

Product Reveal

A sleek silver smartphone rotating slowly on a minimalist white surface, soft studio lighting highlighting the edges, camera pulls back to reveal the full device in a modern tech showroom

Urban Architecture

Modern glass skyscraper facade at golden hour, camera glides upward along the building, warm sunlight reflecting off windows, clouds drifting in background

Nature Timelapse

Mountain landscape at sunrise, mist rolling through valleys, camera pans left revealing alpine peaks, birds flying across frame, natural lighting transitions from cool to warm tones

Abstract Motion

Geometric shapes morphing and rotating in 3D space, deep blue and cyan color palette, smooth camera movement through digital landscape, particles flowing with motion

For Developers

A few lines of code.
Image to video. Seconds flat.

ModelsLab handles the infrastructure: fast inference, auto-scaling, and a developer-friendly API. No GPU management needed.

  • Serverless: scales to zero, scales to millions
  • Pay per second, no minimums
  • Python and JavaScript SDKs, plus REST API
import requests
response = requests.post(
"https://modelslab.com/api/v7/video-fusion/image-to-video",
json={
"key": "YOUR_API_KEY",
"prompt": "The person from the reference image is a travel vlogger standing on the Great Wall of China, speaking directly to the camera in a natural vlog style. Multishot cinematic sequence starting with a medium close-up selfie shot, the vlogger holding the camera, relaxed expression, light wind, then a smooth pan revealing the Great Wall stretching across the mountains with tourists clearly visible in the background, followed by an over-the-shoulder shot of the vlogger pointing toward the scenic views. The vlogger says clearly and naturally for about 5 seconds: “Right now, I’m standing on the Great Wall of China… and the view here is absolutely unreal.” Add realistic outdoor ambience with soft wind sounds, distant crowd murmurs, footsteps on stone, and clean vlog-style voice audio. Ultra-realistic visuals, perfect face consistency with the reference image, sharp background details, natural daylight, cinematic color grading, stable camera motion, authentic travel vlog mood, immersive and inspiring atmosphere, no distortions, no extra people, duration approximately 5 seconds.",
"duration": "5",
"init_audio": "https://assets.modelslab.ai/generations/ba1837f2-a8a1-49ac-ac0f-2809818867c0.mp3",
"init_image": "https://assets.modelslab.ai/generations/a2dd96c6-b148-4bdc-aefc-453157d5fd0c.png",
"resolution": "720p"
}
)
print(response.json())

FAQ

Common questions about Wan2.6 Image To Video

Read the docs

Wan 2.6 Flash generates videos in 5-15 seconds, while the full model takes 45-90 seconds. The Flash variant is distilled for reduced latency while maintaining core quality and image-to-video capabilities.

Yes. Wan 2.6 features strong identity retention, allowing you to upload a static character or product photo and animate it without morphing faces or changing clothing details. Face stability scores 8.5/10.

Yes. Wan 2.6 offers native phoneme-level lip synchronization, generating facial micro-expressions and lip movements that align perfectly with input audio or text-to-speech scripts without external dubbing software.

Wan 2.6 generates 15-second clips at up to 4K resolution (1080p, 2K, or native 4K). The Pro variant supports multi-shot narrative generation with consistent characters and lighting across shots.

Supported formats are JPG, JPEG, PNG, and WebP, up to 5MB and 300-6000px dimensions. Reference videos (MP4) up to 50MB and 5 seconds are also supported for character consistency.

Wan 2.6 scores 8.5/10 for face stability and 8/10 for prompt adherence, outperforming competitors. It maintains visual coherence throughout full 15-second sequences where earlier models break down after 5-7 seconds.

Ready to create?

Start generating with Wan2.6 Image To Video on ModelsLab.