Omnihuman-1.5
Avatars Speak Your Words
Build Expressive Videos Fast
Audio Sync
Semantic Expression Matching
Characters match audio rhythm, prosody, and semantics with natural gestures.
Full Control
Unrestricted Motion Camera
Text prompts guide camera moves, actions, and multi-character scenes.
One Call
Portrait to Video
Omnihuman-1.5 API turns single image plus audio into 1080p avatar video.
Examples
See what Omnihuman-1.5 can create
Copy any prompt below and try it yourself in the playground.
Cityscape Talk
“Professional in urban office, discussing quarterly results with confident gestures, dynamic camera pan from medium shot to close-up, natural lighting”
Tech Demo
“Engineer at whiteboard explaining AI architecture, enthusiastic expressions, hand waves syncing to audio, steady tracking shot”
Product Pitch
“Designer presenting sleek gadget prototype, excited tone with product close-ups, smooth camera zoom, modern studio background”
Nature Guide
“Explorer in forest trail narrating wildlife facts, calm gestures matching audio, wide establishing shot to medium, golden hour light”
For Developers
A few lines of code.
Video avatar. One endpoint.
ModelsLab handles the infrastructure: fast inference, auto-scaling, and a developer-friendly API. No GPU management needed.
- Serverless: scales to zero, scales to millions
- Pay per second, no minimums
- Python and JavaScript SDKs, plus REST API
import requestsresponse = requests.post("https://modelslab.com/api/v7/video-fusion/image-to-video",json={"key": "YOUR_API_KEY","prompt": "The camera zoomed in. The woman spoke to the camera, and after finishing, she quickly turned around and ran backward.","init_audio": "https://assets.modelslab.ai/generations/7e1221ae-c5a9-4b1a-96cb-3448cc73c6e3.m4a","init_image": "https://assets.modelslab.ai/generations/8931fb55-905f-4ae8-8924-1b4e583ff789.png"})print(response.json())