# Wan2.6 Image To Video 

> Generate cinematic 1080p, 24fps videos from a single image with multi-shot storytelling, native lip-sync, 15s length, and consumer-GPU support.

## Overview

- **Model ID**: `wan2.6-i2v`
- **Category**: video
- **Provider**: alibaba_cloud
- **Status**: model_ready
- **Screenshot**: `https://assets.modelslab.com/generations/8b9f8404-8c36-42d9-a4a8-cf285e82e55d.webp`

## API Information

This model can be used via our HTTP API. See the API documentation and usage examples below.

### Endpoint

- **URL**: `https://modelslab.com/api/v7/video-fusion/image-to-video`
- **Method**: POST

### Parameters

- **`init_image`** (required): Add the image to convert it to video using prompt
  - Type: file

- **`init_audio`** (required): The video content will attempt to align with the audio content, such as lip movements and rhythm. Format: WAV, MP3. If the audio duration exceeds the duration value (5 or 10 seconds), the first 5 or 10 seconds are automatically used, and the rest is discarded. If the audio is shorter than the video duration, the part of the video beyond the audio length will be silent.
  - Type: file

- **`prompt`** (required): Enter a prompt to define the actions you want your image to perform.
  - Type: textarea

- **`model_id`** (optional): Model_id for selecting the model from mutiple models
  - Type: text

- **`duration`** (required): The duration of the generated video in seconds.
  - Type: select (options: 5, 10, 15)

- **`resolution`** (required): The resolution of the generated video in pixel.
  - Type: select (options: 720p, 1080p)

## Usage Examples

### cURL

```bash
curl --request POST \
  --url https://modelslab.com/api/v7/video-fusion/image-to-video \
  --header "Content-Type: application/json" \
  --data '{
    "key": "YOUR_API_KEY",
    "model_id": "wan2.6-i2v",
    "init_image": "https://assets.modelslab.com/generations/a2dd96c6-b148-4bdc-aefc-453157d5fd0c.png",
    "init_audio": "https://assets.modelslab.com/generations/ba1837f2-a8a1-49ac-ac0f-2809818867c0.mp3",
    "prompt": "The person from the reference image is a travel vlogger standing on the Great Wall of China, speaking directly to the camera in a natural vlog style. Multishot cinematic sequence starting with a medium close-up selfie shot, the vlogger holding the camera, relaxed expression, light wind, then a smooth pan revealing the Great Wall stretching across the mountains with tourists clearly visible in the background, followed by an over-the-shoulder shot of the vlogger pointing toward the scenic views. The vlogger says clearly and naturally for about 5 seconds: “Right now, I’m standing on the Great Wall of China… and the view here is absolutely unreal.” Add realistic outdoor ambience with soft wind sounds, distant crowd murmurs, footsteps on stone, and clean vlog-style voice audio. Ultra-realistic visuals, perfect face consistency with the reference image, sharp background details, natural daylight, cinematic color grading, stable camera motion, authentic travel vlog mood, immersive and inspiring atmosphere, no distortions, no extra people, duration approximately 5 seconds.",
    "duration": "5",
    "resolution": "720p"
  }'
```

### Python

```python
import requests

response = requests.post(
    "https://modelslab.com/api/v7/video-fusion/image-to-video",
    headers={
        "Content-Type": "application/json"
    },
    json={
        "key": "YOUR_API_KEY",
        "model_id": "wan2.6-i2v",
        "init_image": "https://assets.modelslab.com/generations/a2dd96c6-b148-4bdc-aefc-453157d5fd0c.png",
        "init_audio": "https://assets.modelslab.com/generations/ba1837f2-a8a1-49ac-ac0f-2809818867c0.mp3",
        "prompt": "The person from the reference image is a travel vlogger standing on the Great Wall of China, speaking directly to the camera in a natural vlog style. Multishot cinematic sequence starting with a medium close-up selfie shot, the vlogger holding the camera, relaxed expression, light wind, then a smooth pan revealing the Great Wall stretching across the mountains with tourists clearly visible in the background, followed by an over-the-shoulder shot of the vlogger pointing toward the scenic views. The vlogger says clearly and naturally for about 5 seconds: “Right now, I’m standing on the Great Wall of China… and the view here is absolutely unreal.” Add realistic outdoor ambience with soft wind sounds, distant crowd murmurs, footsteps on stone, and clean vlog-style voice audio. Ultra-realistic visuals, perfect face consistency with the reference image, sharp background details, natural daylight, cinematic color grading, stable camera motion, authentic travel vlog mood, immersive and inspiring atmosphere, no distortions, no extra people, duration approximately 5 seconds.",
        "duration": "5",
        "resolution": "720p"
    }
)

print(response.json())
```

### JavaScript

```javascript
fetch("https://modelslab.com/api/v7/video-fusion/image-to-video", {
  method: "POST",
  headers: {
    "Content-Type": "application/json"
  },
  body: JSON.stringify({
    "key": "YOUR_API_KEY",
    "model_id": "wan2.6-i2v",
    "init_image": "https://assets.modelslab.com/generations/a2dd96c6-b148-4bdc-aefc-453157d5fd0c.png",
    "init_audio": "https://assets.modelslab.com/generations/ba1837f2-a8a1-49ac-ac0f-2809818867c0.mp3",
    "prompt": "The person from the reference image is a travel vlogger standing on the Great Wall of China, speaking directly to the camera in a natural vlog style. Multishot cinematic sequence starting with a medium close-up selfie shot, the vlogger holding the camera, relaxed expression, light wind, then a smooth pan revealing the Great Wall stretching across the mountains with tourists clearly visible in the background, followed by an over-the-shoulder shot of the vlogger pointing toward the scenic views. The vlogger says clearly and naturally for about 5 seconds: “Right now, I’m standing on the Great Wall of China… and the view here is absolutely unreal.” Add realistic outdoor ambience with soft wind sounds, distant crowd murmurs, footsteps on stone, and clean vlog-style voice audio. Ultra-realistic visuals, perfect face consistency with the reference image, sharp background details, natural daylight, cinematic color grading, stable camera motion, authentic travel vlog mood, immersive and inspiring atmosphere, no distortions, no extra people, duration approximately 5 seconds.",
    "duration": "5",
    "resolution": "720p"
  })
})
.then(response => response.json())
.then(data => console.log(data));
```

## Links

- [Model Playground](https://modelslab.com/models/wan-2.6-image-to-video/wan2.6-i2v)
- [API Documentation](https://docs.modelslab.com)
- [ModelsLab Platform](https://modelslab.com)