Seedance 2.0 is here - create consistent, multimodal AI videos faster with images, videos, and audio in one prompt.

Try Now
Skip to main content
Available now on ModelsLab · Video Generation

WAN 2.7 Text To VideoCinematic video. Text prompt.

Generate. Control. Edit. Refine.

Multi-Reference Control

Five References, One Frame

Lock up to 5 simultaneous image or video references for precise multi-subject composition and character consistency.

Native Audio Sync

Audio-Driven Generation

Sync videos to your audio or auto-generate matching background music for lip-sync and music-reactive content.

Instruction-Based Editing

Edit Without Regeneration

Modify existing videos with text instructions—swap backgrounds, adjust lighting, recolor elements instantly.

Examples

See what WAN 2.7 Text To Video can create

Copy any prompt below and try it yourself in the playground.

Urban Timelapse

Cinematic timelapse of a modern city skyline at sunset, clouds moving across the sky, warm golden light reflecting off glass buildings, smooth camera pan from left to right, 4K quality

Product Showcase

Sleek smartphone rotating on a minimalist white surface, soft studio lighting, shallow depth of field, premium materials catching light, smooth 360-degree rotation, professional product photography style

Nature Landscape

Aerial drone shot of misty mountain peaks at dawn, golden sunlight breaking through clouds, lush green valleys below, cinematic color grading, smooth camera movement, serene atmosphere

Abstract Motion

Flowing liquid metal morphing through geometric shapes, vibrant neon colors transitioning smoothly, volumetric lighting, high contrast, futuristic aesthetic, seamless loop-ready motion

For Developers

A few lines of code.
Cinematic video. Three lines.

ModelsLab handles the infrastructure: fast inference, auto-scaling, and a developer-friendly API. No GPU management needed.

  • Serverless: scales to zero, scales to millions
  • Pay per second, no minimums
  • Python and JavaScript SDKs, plus REST API
import requests
response = requests.post(
"https://modelslab.com/api/v7/video-fusion/text-to-video",
json={
"key": "YOUR_API_KEY",
"prompt": "I man talking towards camera from great wall of china and saying, Welcome to my vlogs the beautiful views from this place is breathetaking and amazing you should also come here",
"duration": "5",
"init_audio": "https://assets.modelslab.ai/generations/74c4f2e6-2fa6-4d8f-a0e3-09ff1a94d9e1.mp3",
"resolution": "720P"
}
)
print(response.json())

FAQ

Common questions about WAN 2.7 Text To Video

Read the docs

WAN 2.7 is an advanced AI video generation model that creates high-quality 1080p videos from text prompts, supporting up to 15-second durations with native audio synchronization. It excels at text-to-video, image-to-video, and multi-reference workflows with character consistency across complex scenes.

WAN 2.7 accepts up to 5 simultaneous image or video references, plus a 9-grid system for multi-angle control. This enables precise character consistency, pose accuracy, and composition control across multiple shots without regeneration.

Yes. Instruction-based editing lets you modify existing videos with text commands like 'change the background to cyberpunk' or 'add a red jacket.' The model adjusts only the specified elements while preserving the rest of the video structure.

You can provide your own audio file (WAV or MP3, 3-30 seconds, up to 15 MB) for lip-sync and music-driven visuals, or let the model auto-generate matching background music. Audio drives video generation for synchronized, reactive content.

WAN 2.7 outputs 1080p or 720p video with native audio in all popular aspect ratios (16:9, 9:16, 1:1). Videos support durations up to 15 seconds with flexible frame control and multi-reference composition.

Yes. WAN 2.7 includes built-in prompt rewriting that automatically expands your text description with cinematic details, visual cues, and stylistic elements. You can view the enhanced prompt and disable this feature if you prefer exact control.

Ready to create?

Start generating with WAN 2.7 Text To Video on ModelsLab.