Seedance 2.0 is here - create consistent, multimodal AI videos faster with images, videos, and audio in one prompt.

Try Now
Skip to main content
Available now on ModelsLab · Video Generation

CogVideoXText Sparks Video

Generate Videos Fast

Fast Endpoint

Text to Video

CogVideoX model turns prompts into 10-second videos at 16 fps and 768x1360 resolution.

High Fidelity

Expert Transformer

CogVideoX video generation uses diffusion transformer for coherent motion and text alignment.

API Ready

Simple Integration

CogVideoX API supports fast text-to-video with diffusers pipeline in one call.

Examples

See what CogVideoX can create

Copy any prompt below and try it yourself in the playground.

City Timelapse

A bustling city skyline at dusk, lights flickering on as traffic flows smoothly, cinematic wide shot, high detail, smooth camera pan.

Ocean Waves

Waves crashing on rocky shore under golden sunset, foam spraying, realistic water physics, slow motion, 16 fps, serene atmosphere.

Forest Path

Sunlight filtering through dense forest canopy, leaves rustling in wind, dirt path winding ahead, natural lighting, steady tracking shot.

Abstract Lights

Pulsing neon lights in geometric patterns, syncing to invisible rhythm, dark background, high contrast, fluid motion, futuristic vibe.

For Developers

A few lines of code.
Video from text. One call.

ModelsLab handles the infrastructure: fast inference, auto-scaling, and a developer-friendly API. No GPU management needed.

  • Serverless: scales to zero, scales to millions
  • Pay per second, no minimums
  • Python and JavaScript SDKs, plus REST API
import requests
response = requests.post(
"https://modelslab.com/api/v6/video/text2video",
json={
"key": "YOUR_API_KEY",
"prompt": "A suited astronaut, with the red dust of Mars clinging to their boots, reaches out to shake hands with an alien being, their skin a shimmering blue, under the pink-tinged sky of the fourth planet. In the background, a sleek silver rocket, a beacon of human ingenuity, stands tall, its engines powered down, as the two representatives of different worlds exchange a historic greeting amidst the desolate beauty of the Martian landscape.",
"output_type": "mp4"
}
)
print(response.json())

FAQ

Common questions about CogVideoX

Read the docs

CogVideoX is an open-source text-to-video diffusion model from THUDM. It generates 10-second videos at 768x1360 resolution and 16 fps. Fast endpoint available via API.

It uses 3D causal VAE for compression and expert transformer for text-video alignment. Supports text prompts for coherent clips. Cogvideox model excels in motion and fidelity.

CogVideoX API provides text-to-video endpoint for fast generation. Integrates with diffusers pipeline. Ideal for apps needing quick video output.

Cogvideox-2B offers speed on 12 GB VRAM; 5B prioritizes quality on 20 GB VRAM. Both generate 6-10 second clips. Fast version uses optimized pipeline.

CogVideoX provides open weights under Apache 2.0 for 2B variant. Superior text alignment and duration over prior models. CogVideoX API simplifies access.

Standard output is 5-10 seconds at 8-16 fps. Resolutions up to 1360x768. Fast endpoint focuses on text-to-video speed.

Ready to create?

Start generating with CogVideoX on ModelsLab.