Available now on ModelsLab · Video Generation

Omnihuman
Omnihuman Animates Humans

Try Omnihuman API Documentation

Build Videos From Inputs

Image + Audio

Lip-Synced Realism

Pairs single human image with audio for precise lip-sync and emotion-matched motion.

Any Aspect Ratio

Portrait to Full-Body

Handles portrait, half-body, full-body images in varied ratios with consistent quality.

Multimodal Control

Audio Drives Motion

Uses audio signals for natural expressions, body language, and scene dynamics.

Examples

See what Omnihuman can create

Copy any prompt below and try it yourself in the playground.

Cityscape Talk

“Professional man in suit stands in bustling city street at dusk, speaking energetically about urban innovation, realistic lighting, dynamic camera pan, high detail textures.”

Product Demo

“Engineer holds sleek gadget in modern lab, explains features with precise gestures, bright overhead lights, subtle background tech displays, sharp focus.”

Nature Guide

“Hiker in forest clearing describes trail map, natural arm movements synced to audio, dappled sunlight through trees, realistic fabric textures.”

Abstract Art

“Stylized figure in geometric studio dances to rhythm, fluid morphing forms, vibrant color shifts, continuous motion with soft glows.”

For Developers

A few lines of code.
Video from image, audio.

ModelsLab handles the infrastructure: fast inference, auto-scaling, and a developer-friendly API. No GPU management needed.

Serverless: scales to zero, scales to millions
Pay per second, no minimums
Python and JavaScript SDKs, plus REST API

API Documentation

import requests

response = requests.post(
    "https://modelslab.com/api/v7/video-fusion/image-to-video",
    json={
  "key": "YOUR_API_KEY",
  "init_audio": "https://assets.modelslab.ai/generations/efc19902-2b68-4dac-aa8a-b84960651790",
  "init_image": "https://assets.modelslab.ai/generations/18011592-127a-4d6e-adf7-c66d1ce7693c"
}
)
print(response.json())

FAQ

Common questions about Omnihuman

Read the docs

Omnihuman API generates videos from image and audio inputs. It supports lip-sync and motion driven by audio. Endpoint available at ModelsLab.

Combines single image with audio for realistic human animation. Handles any aspect ratio and body type. Outputs high-quality synced videos.

Omnihuman excels in audio-conditioned realism over text-only models. Use for precise lip-sync needs. API integrates via simple HTTP calls.

Requires one human image and audio file. Optional prompts refine output. Supports video motion signals too.

Generates clips over one minute with dynamic motion. Scales for multi-character scenes. Quality holds in long sequences.

Permitted for generated videos. Costs scale per second output. Ideal for avatars and social content.

Ready to create?

Start generating with Omnihuman on ModelsLab.

Try Omnihuman API Documentation

OmnihumanOmnihuman Animates Humans

Build Videos From Inputs

Lip-Synced Realism

Portrait to Full-Body

Audio Drives Motion

See what Omnihuman can create

A few lines of code.Video from image, audio.

Common questions about Omnihuman

What is Omnihuman API?

How does Omnihuman model work?

Best Omnihuman alternative?

What inputs does Omnihuman need?

Omnihuman video length limits?

Commercial use with Omnihuman?

Ready to create?

Omnihuman
Omnihuman Animates Humans

A few lines of code.
Video from image, audio.