Happy Horse 1.0 is now on ModelsLab

Try Now
Skip to main content
Available now on ModelsLab · Video Generation

Kling 3.0 Text-to-Video API — Cinematic Video GenerationKuaishou's text-to-video model via REST API. Multi-shot, native audio, free credits.

Why developers ship Kling 3.0 text-to-video

Kling 3.0

Kuaishou's flagship video model

Kling 3.0 is the reference for cinematic motion realism and physics accuracy in AI text-to-video. Strong prompt adherence across long clips, ideal for ads, short-form social, and storyboard previs.

Multi-shot

Six coherent shots in one clip

Chain up to six shots into a narrative sequence with automatic transitions. The model maintains visual continuity, lighting, and tone across cuts so you ship a complete story in one API call.

Native audio

Synced dialogue, music, and SFX

Generate synchronized audio, voiceover, and effects in multiple languages. Lip-sync matches the visuals so dialogue scenes do not need a separate TTS pipeline.

Consistent characters

Lock subjects across shots

Pass reference images to keep the same character, product, or set across every shot in a multi-shot clip. Cuts the post-production cycle of fixing identity drift.

Resolution

1080p output, 24/30 fps

Native 1080p generation at 24 or 30 fps. 16:9 landscape, 9:16 portrait for short-form social, and 1:1 square for feed ads — all from the same endpoint.

Async delivery

Webhook callback included

Submit your text-to-video request, get the final MP4 URL delivered to your webhook when ready. No long-polling, no WebSocket — standard HTTP and JSON.

Pricing

Pay per second of output

Pay-per-second pricing on Kling 3.0 generation. No subscription, no monthly minimum. Volume discounts at 1000+ minutes per month.

OpenAI-compatible workflow

One platform, every modality

Pair Kling 3.0 text-to-video with the image, audio, and LLM APIs in the same dashboard. Build end-to-end content pipelines without juggling vendor accounts.

Examples

Kling 3.0 text-to-video examples

Copy any prompt below and try it yourself in the playground.

Cinematic city chase

A cinematic 8-second clip: a sleek black sports car races through a rain-soaked city at night, neon reflections on wet pavement, low-angle tracking shot, motion blur on background lights, dramatic film score, 1080p 24fps

Product reveal

A 6-second product reveal: a slow dolly-in on a minimal ceramic coffee mug on a marble counter, soft daylight from a window, shallow depth of field, particles drifting in the light beam, cinematic 35mm look, smooth motion

Animated dialogue scene

Anime-style scene with two characters in conversation, soft pastel colors, slow dolly shot, dramatic side lighting, lip-synced dialogue, cinematic camera movement, film quality 1080p

Epic landscape flyover

Sweeping aerial drone shot over a mountain range at golden hour, dramatic clouds, sunbeams cutting through valleys, ultra-wide angle, smooth gimbal motion, cinematic color grading, 4K

For Developers

A few lines of code.
Cinematic video in one POST request

ModelsLab handles the infrastructure: fast inference, auto-scaling, and a developer-friendly API. No GPU management needed.

  • Serverless: scales to zero, scales to millions
  • Pay per second, no minimums
  • Python and JavaScript SDKs, plus REST API
import requests
response = requests.post(
"https://modelslab.com/api/v7/video-fusion/text-to-video",
json={
"key": "YOUR_API_KEY",
"prompt": "A cinematic ultra-realistic shot of a young man standing on a mountain cliff at sunrise, wind blowing through his hair, dramatic clouds moving fast, golden sun rays, epic wide angle drone shot, shallow depth of field, ultra-detailed textures, HDR lighting, smooth cinematic motion, 4K, movie-style color grading, emotional atmosphere, slow motion, breathtaking landscape, professional filmmaking, stunning visuals",
"duration": "5",
"aspect_ratio": "1:1"
}
)
print(response.json())

FAQ

Common questions about Kling 3.0 Text-to-Video API — Cinematic Video Generation

Read the docs

The Kling 3.0 text-to-video API is a REST endpoint that runs Kuaishou's flagship video model on cloud GPUs. POST a text prompt; receive an MP4 URL with cinematic-quality video. No Kuaishou account required — sign up on ModelsLab and you can call the endpoint immediately.

Kling 3.0 generates clips from 3 to 15 seconds. Multi-shot sequences fit within those limits. For longer cinematic pieces, chain multiple clips with consistent character and scene references.

Yes. The model produces synchronized audio with the video — dialogue, ambient sound, music. Lip-sync matches on-screen speech, so character dialogue scenes do not need a separate text-to-speech pipeline.

Yes. Pass reference images alongside your prompt and the model preserves character identity, clothing, and props across shots. This is essential for narrative content, product demos, and brand-safe ad creative.

16:9 landscape (1920×1080), 9:16 portrait (1080×1920) for TikTok/Reels/Shorts, and 1:1 square (1080×1080) for feed ads. Pass width and height in the request body.

A 5-second 1080p clip typically generates in 60–90 seconds end-to-end. Multi-shot clips take proportionally longer. The API runs on a dedicated GPU pool with no cold starts.

Pricing starts at $0.05 per second of output. A 5-second clip is $0.25. There is no subscription required; pay only for what you generate. Volume pricing is available for high-throughput workloads.

Kling 3.0 text-to-video is the right pick for high-fidelity hero shots driven by prompts. Kling 2.6 motion control adds explicit camera (pan/tilt/zoom) and per-frame subject trajectory parameters when you need precise control over movement. Both are accessible via ModelsLab's Kling endpoint.

Yes — that is the typical use case. The API is designed for embedding in user-facing products. Pay-per-call pricing means margin is predictable, and webhook delivery means your backend stays simple.

Yes. Prompts and generated videos are processed on infrastructure inside compliant regions. Outputs are auto-deleted from the CDN after 7 days by default. A signed DPA and dedicated VPC deployments are available for enterprise customers.

Ready to create?

Start generating with Kling 3.0 Text-to-Video API — Cinematic Video Generation on ModelsLab.