Available now on ModelsLab · Video Generation

Omnihuman-1.5
Avatars Speak Your Words

Try Omnihuman-1.5 API Documentation

Build Expressive Videos Fast

Audio Sync

Semantic Expression Matching

Characters match audio rhythm, prosody, and semantics with natural gestures.

Full Control

Unrestricted Motion Camera

Text prompts guide camera moves, actions, and multi-character scenes.

One Call

Portrait to Video

Omnihuman-1.5 API turns single image plus audio into 1080p avatar video.

Examples

See what Omnihuman-1.5 can create

Copy any prompt below and try it yourself in the playground.

Cityscape Talk

“Professional in urban office, discussing quarterly results with confident gestures, dynamic camera pan from medium shot to close-up, natural lighting”

Tech Demo

“Engineer at whiteboard explaining AI architecture, enthusiastic expressions, hand waves syncing to audio, steady tracking shot”

Product Pitch

“Designer presenting sleek gadget prototype, excited tone with product close-ups, smooth camera zoom, modern studio background”

Nature Guide

“Explorer in forest trail narrating wildlife facts, calm gestures matching audio, wide establishing shot to medium, golden hour light”

For Developers

A few lines of code.
Video avatar. One endpoint.

ModelsLab handles the infrastructure: fast inference, auto-scaling, and a developer-friendly API. No GPU management needed.

Serverless: scales to zero, scales to millions
Pay per second, no minimums
Python and JavaScript SDKs, plus REST API

API Documentation

import requests

response = requests.post(
    "https://modelslab.com/api/v7/video-fusion/image-to-video",
    json={
  "key": "YOUR_API_KEY",
  "prompt": "The camera zoomed in. The woman spoke to the camera, and after finishing, she quickly turned around and ran backward.",
  "init_audio": "https://assets.modelslab.ai/generations/7e1221ae-c5a9-4b1a-96cb-3448cc73c6e3.m4a",
  "init_image": "https://assets.modelslab.ai/generations/8931fb55-905f-4ae8-8924-1b4e583ff789.png"
}
)
print(response.json())

FAQ

Common questions about Omnihuman-1.5

Read the docs

Omnihuman-1.5 generates video from one image, audio, and optional text. It creates expressive animations with semantic audio sync. Supports humans, animals, multi-character scenes.

Send image_url, audio_url, and prompt to Omnihuman 1.5 endpoint. Get async video URL after processing. Use 720p for speed or 1080p for quality.

Omnihuman-1.5 excels in full-body motion and audio semantics over basic lip-sync tools. Check ModelsLab for similar video APIs. It leads in expressive control.

Max 60s at 720p, 30s at 1080p. Audio drives lip sync, expressions, gestures. Multiple languages including English, Chinese, Spanish.

Yes, specify speakers and background reactions via prompts. Generates coherent interactions with shared attention fusion. Ideal for dialogue scenes.

Supports webhooks, polling, batch variations. Use TTL for content management. Scales for customer avatars or content tools.

Ready to create?

Start generating with Omnihuman-1.5 on ModelsLab.

Try Omnihuman-1.5 API Documentation

Omnihuman-1.5Avatars Speak Your Words

Build Expressive Videos Fast

Semantic Expression Matching

Unrestricted Motion Camera

Portrait to Video

See what Omnihuman-1.5 can create

A few lines of code.Video avatar. One endpoint.

Common questions about Omnihuman-1.5

What is Omnihuman-1.5 model?

How does Omnihuman-1.5 API work?

What are Omnihuman-1.5 alternatives?

What audio lengths does Omnihuman-1.5 support?

Can Omnihuman-1.5 handle multi-character videos?

Is Omnihuman-1.5 API production ready?

Ready to create?

Omnihuman-1.5
Avatars Speak Your Words

A few lines of code.
Video avatar. One endpoint.