Available now on ModelsLab · Video Generation

CogVideoX
Text Sparks Video

Try CogVideoX API Documentation

Generate Videos Fast

Fast Endpoint

Text to Video

CogVideoX model turns prompts into 10-second videos at 16 fps and 768x1360 resolution.

High Fidelity

Expert Transformer

CogVideoX video generation uses diffusion transformer for coherent motion and text alignment.

API Ready

Simple Integration

CogVideoX API supports fast text-to-video with diffusers pipeline in one call.

Examples

See what CogVideoX can create

Copy any prompt below and try it yourself in the playground.

City Timelapse

“A bustling city skyline at dusk, lights flickering on as traffic flows smoothly, cinematic wide shot, high detail, smooth camera pan.”

Ocean Waves

“Waves crashing on rocky shore under golden sunset, foam spraying, realistic water physics, slow motion, 16 fps, serene atmosphere.”

Forest Path

“Sunlight filtering through dense forest canopy, leaves rustling in wind, dirt path winding ahead, natural lighting, steady tracking shot.”

Abstract Lights

“Pulsing neon lights in geometric patterns, syncing to invisible rhythm, dark background, high contrast, fluid motion, futuristic vibe.”

For Developers

A few lines of code.
Video from text. One call.

ModelsLab handles the infrastructure: fast inference, auto-scaling, and a developer-friendly API. No GPU management needed.

Serverless: scales to zero, scales to millions
Pay per second, no minimums
Python and JavaScript SDKs, plus REST API

API Documentation

import requests

response = requests.post(
    "https://modelslab.com/api/v6/video/text2video",
    json={
  "key": "YOUR_API_KEY",
  "prompt": "A suited astronaut, with the red dust of Mars clinging to their boots, reaches out to shake hands with an alien being, their skin a shimmering blue, under the pink-tinged sky of the fourth planet. In the background, a sleek silver rocket, a beacon of human ingenuity, stands tall, its engines powered down, as the two representatives of different worlds exchange a historic greeting amidst the desolate beauty of the Martian landscape.",
  "output_type": "mp4"
}
)
print(response.json())

FAQ

Common questions about CogVideoX

Read the docs

CogVideoX is an open-source text-to-video diffusion model from THUDM. It generates 10-second videos at 768x1360 resolution and 16 fps. Fast endpoint available via API.

It uses 3D causal VAE for compression and expert transformer for text-video alignment. Supports text prompts for coherent clips. Cogvideox model excels in motion and fidelity.

CogVideoX API provides text-to-video endpoint for fast generation. Integrates with diffusers pipeline. Ideal for apps needing quick video output.

Cogvideox-2B offers speed on 12 GB VRAM; 5B prioritizes quality on 20 GB VRAM. Both generate 6-10 second clips. Fast version uses optimized pipeline.

CogVideoX provides open weights under Apache 2.0 for 2B variant. Superior text alignment and duration over prior models. CogVideoX API simplifies access.

Standard output is 5-10 seconds at 8-16 fps. Resolutions up to 1360x768. Fast endpoint focuses on text-to-video speed.

Ready to create?

Start generating with CogVideoX on ModelsLab.

Try CogVideoX API Documentation

CogVideoXText Sparks Video

Generate Videos Fast

Text to Video

Expert Transformer

Simple Integration

See what CogVideoX can create

A few lines of code.Video from text. One call.

Common questions about CogVideoX

What is CogVideoX model?

How does CogVideoX video generation work?

What is cogvideox API for?

Differences in cogvideox model sizes?

Is CogVideoX alternative to other models?

Cogvideox video generation limits?

Ready to create?

CogVideoX
Text Sparks Video

A few lines of code.
Video from text. One call.