AI APIs for Developers
AI Model APIs
Discover and integrate with powerful AI model APIs for your applications
xAI
Grok Imagine – Image to Video lets you instantly turn your ideas into stunning 1–15 second AI-generated videos. Simply describe your scene, and generate smooth, high-quality videos in 480p and 720p resolution — perfect for social media, ads, storytelling,
xAI
Grok Imagine – Text to Video lets you instantly turn your ideas into stunning 1–15 second AI-generated videos. Simply describe your scene, and generate smooth, high-quality videos in 480p and 720p resolution — perfect for social media, ads, storytelling,
ModelsLab
Create and customize any AI-generated voice you can imagine using a simple text prompt - choose the tone, style, accent, emotion, age, or personality, and instantly turn your words into natural-sounding speech.
ModelsLab
Fast-train your custom models with optimized pipelines, supporting various image formats, and requiring minimal 16GB VRAM for efficient fine-tuning.
ModelsLab
The Qwen Text-to-Speech endpoint generates audio from text using a provided audio URL, producing output that mimics the uploaded voice
xAI
Grok Imagine – Image Edit lets you modify existing images using simple text instructions—add, remove, or change elements while keeping the original image style and details intact.
xAI
Generate high-quality 1024x1024 images in 2.3 seconds with efficient 2.1GB GPU memory use, natural language editing, superior character consistency, and real-time style transfers.
ModelsLab
A distilled version of Z-Image that matches or exceeds leading competitors with only 8 NFEs (Number of Function Evaluations)
ModelsLab
Qwen-Image-Edit-2511 is a powerful, versatile AI tool for sophisticated, prompt-based image editing with strong consistency, identity preservation, and mixed-mode control across subjects and scenes
Open Ai
Sora 2 Pro is an advanced text-to-video AI model that turns simple prompts into high-quality, cinematic videos with realistic motion, consistent characters, and strong scene coherence—built for creators, filmmakers, and production teams.
Alibaba
wan2.6-i2v-flash is an image-to-video generation model in the WAN 2.6 series. It takes a single input image (plus optional text prompt and audio) and generates a short video clip with motion and optionally synchronized sound.
ModelsLab
Z-Image Turbo Model transform an existing image into a new version using a text prompt, rather than generating a picture from scratch. You upload a source image and then describe how you want it changed
KlingAI
Kling Motion Control is an advanced AI-powered motion transfer system that analyzes movement from a reference video and applies it to a static image, creating realistic image-to-video animations with precise body, gesture, and expression control.
ltx
LTX-2 Pro Image-to-Video is a powerful AI model that turns a single still image into a dynamic video clip using a text prompt to guide motion, camera moves, and atmosphere
ltx
LTX-2 Pro Text-to-Video is an advanced AI model that converts text descriptions into high-quality short videos. It can generate cinematic visuals with synchronized audio, such as sound effects and ambience.
Bytedance
Cinematic text-to-video generator with native audio (dialogue+foley+music), up to 1080p/12s output, millisecond lip-sync, MP4 (H.264) at 48 kHz, fast inference for ads and short films.
Bytedance
Seedance 1.5 PR0 creates AI videos using a first frame, last frame, and a prompt to animate smooth transitions.
Bytedance
Seedance 1.5 PR0 creates AI videos using a first frame, last frame, and a prompt to animate smooth transitions.
Black Forest Labs
FLUX-2-Max is a premium text-to-image model within the FLUX family, built to deliver exceptional image quality with high realism, fine detail, and strong adherence to user prompts.
Black Forest Labs
FLUX.2 [max] is the flagship and most capable generative AI model from Black Forest Labs, designed for professional-grade image generation and editing. It represents the pinnacle of the FLUX.2 model family, offering unmatched visual fidelity, creative con
Alibaba
Wan 2.6 supports multiple visual styles, dynamic transitions, and flexible aspect ratios, making it ideal for marketing, social media, storytelling, and creative content generation.
Alibaba
Wan 2.6 is an advanced multimodal AI video generation Model that lets you turn static inputs like images (or text) into high-quality dynamic videos using artificial intelligence. It integrates text, images, video, and audio into a single system.
Black Forest Labs
Generate high-resolution images up to 4MP with rapid 10-second output, ideal for professional printing and fine art creation.
Black Forest Labs
Advanced text-to-image generator with 12B parameters, offering 6x faster generation and superior image quality, ideal for professional design and marketing applications.
Bytedance
Next-generation image creation and editing model delivering ultra-fast 4K resolution outputs, multi-image reference support, natural language editing, and versatile style transfer for creative workflows.
Bytedance
Seedream 4.5 has matured from a “basic tool” into a “reliable production tool”. It delivers a significantly lower failure rate in challenging scenarios such as small faces and fine text. It shifts the user experience from “hoping for luck” to “consistentl
ModelsLab
A distilled version of Z-Image that matches or exceeds leading competitors with only 8 NFEs (Number of Function Evaluations)
ModelsLab
This model allows you to supply an input image along with a text prompt that describes the modifications you want, and it will generate an updated version that reflects your requested changes.
ModelsLab
Flux 2 Dev is a high-performance, developer-focused text-to-image generative model designed for experimentation, customization, and advanced creative workflows.
KlingAI
Kling V1.6 is an advanced generative video model designed to transform multiple input images into coherent, high-quality animated sequences.
Black Forest Labs
Flux 2 Pro Image Editing is a high-performance AI tool that allows you to enhance, modify, and transform images with exceptional accuracy. It delivers seamless object removal, realistic background changes, detailed retouching, and professional-quality.
Black Forest Labs
Flux 2 Pro is an advanced text-to-image generative model designed for high-precision visual synthesis and professional-grade imaging workflows.
Generate high-quality 1024x1024 images in 2.3 seconds with efficient 2.1GB GPU memory use, natural language editing, superior character consistency, and real-time style transfers.
ModelsLab
Interior Mixer is a model that combines different interior objects images and design elements into one unified, realistic image.
ModelsLab
Remove unwanted objects seamlessly from images with high-resolution inpainting up to 1024x1024 pixels, using automatic mask detection for precise edits.
ModelsLab
Qwen Image-to-Image model is designed for image editing and transformation Images. It allows users to modify existing images through text prompts such as changing objects, adjusting backgrounds, or altering styles.
Ultra-fast image editing with natural language prompts, preserving character consistency and scene details, supporting pixel-perfect edits and complex transformations in seconds.
Minmax
The MiniMax Hailuo-0.2 (Start/End Frame) Image-to-Video variant enables creators to animate still images into dynamic video clips with defined beginning and end visuals.
Minmax
MiniMax Hailuo-0.2 Image-to-Video offers a practical and efficient way to animate still images into short videos.
Minmax
MiniMax Hailuo-2.3 Fast Image-to-Video offers a streamlined, cost-effective and rapid way to animate still images into short video sequences.
Minmax
MiniMax Hailuo 2.3 Image-to-Video gives creators a powerful way to transform still images into high-quality dynamic video clips with control over motion, camera and style.
Minmax
MiniMax Hailuo2.3 model is a powerful next-gen text-to-video model aimed at creators who want to turn prompts into short, high-quality video clips with decent resolution and strong motion/physics fidelity.
Minmax
MiniMax Hailuo2.3 model is a powerful next-gen text-to-video model aimed at creators who want to turn prompts into short, high-quality video clips with decent resolution and strong motion/physics fidelity.
KlingAI
Kling V2.1 Image To Video(Start/ End Frame) is a generative AI video model that takes as input a static images (and optionally a prompt) and produces a short video where the input image is animated motion, pan, zoom etc.
KlingAI
Kling V2 Master brings cinematic storytelling to your fingertips. It’s more than animation — it’s AI-assisted cinematography, turning your static visuals into emotionally engaging motion sequences.
KlingAI
Kling V2.1 Master isn’t just an animation model it’s a motion director for your imagination. Every frame reflects professional film grammar, fluid motion, and emotionally resonant depth.
KlingAI
Kling V2.5 Turbo is the latest evolution of Kling’s powerful video-generation Model — a cutting-edge image-to-video AI model designed to turn static visuals into breathtaking, dynamic motion clips in seconds.
KlingAI
Kling V2.1 Image-to-Video is a premium video-generation AI model that takes a static image (your input) plus a descriptive prompt of motion, camera, style etc.
KlingAI
Kling V2 Master Text-to-Video is a state-of-the-art text-to-video engine aimed at creators who want high-quality, cinematic video drives via text prompts.
KlingAI
Kling V2.1 Master is the premium-tier version of the text-to-video Model from KlingAI, designed to turn richly described text prompts into high-quality cinematic video clips.
KlingAI
Kling 2.5 Turbo is a state-of-the-art short‐clip video generation model allowing creators to go from prompt to high-quality 5-10 s cinematic video with good motion and style consistency.
ModelsLab
Open-source 20B-parameter text-to-image model with advanced multimodal diffusion transformer architecture, excelling in high-fidelity text rendering and precise image editing.
ModelsLab
This Model is used to remove any watermarks present in videos, producing clean, watermark-free outputs.
ModelsLab
Transformer-based image editing model with 20B parameters supports pixel-level and semantic edits, bilingual text modification, style transfer, and multi-image editing up to 1024×1024 resolution.
ModelsLab
This endpoint enables you to generate descriptive captions for images. By submitting an image to the endpoint, it analyzes the visual content and returns a concise, human-like caption that summarizes what’s depicted in the image.
Bytedance
Seedance 1.0 Pro Fast Text-to-Video is a cutting-edge AI model by ByteDance designed to generate high-quality, cinematic video content from text descriptions. This accelerated version of the Seedance 1.0 Pro model emphasizes speed and efficient
Bytedance
Seedance 1.0 Pro Fast is an accelerated, high-quality AI model from ByteDance that transforms a still image and a text prompt into a cinematic video. It offers faster generation and lower costs than the standard Seedance 1.0 Pro.
Bytedance
Transforms a single image and audio into expressive, full-body human videos with semantic gesture understanding, multi-character support, and dynamic camera control.
Convert static images into dynamic videos with 4K resolution, realistic motion, and cinematic effects, ideal for creators seeking high-quality video content.
Veo 3.1 (Image-to-Video) instantly transforms a single image and text prompt into smooth, cinematic video with realistic motion and sound.
A powerful AI model that transforms written prompts into dynamic, cinematic videos with realistic motion, scenes, and sound. It supports 720p/1080p, 24 FPS, in both 16:9 and 9:16 formats. Despite being in preview, it can be used for real commercial
Veo 3.1 Fast is Google DeepMind’s quick, text-to-video AI that turns prompts into short, realistic videos with synced audio. It’s built for speed, cinematic motion, and clarity, using advanced multimodal diffusion to generate 720p/1080p clips in seconds
Sonauto
Audio Inpaint intelligently reconstructs missing or corrupted portions of an audio clip. Whether you need to remove unwanted noises, repair damaged recordings, or fill silent gaps, the model analyzes the surrounding context to generate smooth.
Sonauto
This endpoint allows clients to extend an existing song / vocal audio track by generating additional material
Bytedance
Seedance 1.0 PR0 Lite creates AI videos using a first frame, last frame, and a prompt to animate smooth transitions.
Open Ai
Generate ultra-realistic cinematic videos from simple text prompts with smooth camera motion and lifelike physics.
Alibaba
Generate up to 10-second cinematic 1080p videos from images with synchronized audio, natural motion, multilingual support, and precise camera control for professional-quality content.
Alibaba
Wan 2.5 is a text-to-video model that generates smooth 5–10s videos in 480p–1080p with smart prompt rewriting and watermarking. In Wan2.5, you can also add auto-generated or custom audio for perfect syn
Bytedance
Next-generation image creation and editing model delivering ultra-fast 4K resolution outputs, multi-image reference support, natural language editing, and versatile style transfer for creative workflows.
Bytedance
Seedream 4.0 combines text-to-image, into one powerful multimodal model. It delivers pixel-perfect precision with natural language control, making it ideal for creators who want speed, quality, and flexibility in image generation.
ModelsLab
Efficiently train custom Stable Diffusion models with flexible batch sizes, gradient checkpointing, and memory-optimized attention requiring 12-24 GB VRAM for high-quality 512×512 to 1024×1024 image outputs.
ModelsLab
Fast-train your custom models with optimized pipelines, supporting various image formats, and requiring minimal 16GB VRAM for efficient fine-tuning.
Runway ML
Runway Gen-4 Image Turbo is an advanced image model that generates and edits visuals from 1–2 input images, with powerful tools for upscaling, adding fine details, and maintaining face consistency.
Bytedance
OmniHuman takes a single human image and audio, generating a realistic video with natural lip-sync and expressions.
Sonauto
Generate full songs from text, lyrics, or melodies with a latent diffusion-powered AI music model offering up to 4:45 min tracks, voice control, and seamless editing.
Ultra-fast image editing with natural language prompts, preserving character consistency and scene details, supporting pixel-perfect edits and complex transformations in seconds.
Generate high-quality 1024x1024 images in 2.3 seconds with efficient 2.1GB GPU memory use, natural language editing, superior character consistency, and real-time style transfers.
Eleven Labs
Eleven Music is cleared for nearly all commercial uses, from film and television to podcasts and social media videos, and from advertisements to gaming
Imagen 4.0 Ultra (Preview 06-06) is Google’s highest-fidelity text-to-image model, producing ultra-realistic visuals with precise prompt adherence, person generation, and multiple aspect ratios—ideal for detailed, high-quality imagery.
Imagen 4.0 Fast (Preview 06-06) is the speed-optimized version designed for rapid, low-latency image generation. It delivers realistic, high-quality visuals and preview-only feature ideal for prototyping and quick iterations.
Runway ML
Edit videos with advanced object manipulation, camera angles, and lighting control using text prompts and optional reference images.
veo-3.0-fast-generate-preview is Google’s speed-optimized AI video generation mode that quickly produces 1080p preview videos. It delivers realistic motion, dynamic scenes, and native audio, making it ideal for testing concepts before full-quality renders
Veo 3 Fast by Google is a high-speed AI video generation model that transforms text or image prompts into stunning 1080p videos with native audio. Optimized for quick turnaround, it’s ideal for creators needing rapid, high-quality content production.
ModelsLab
Generate high-quality 720p videos at 24fps from images with advanced motion control and seamless transitions, ideal for animations and cinematic outputs.
Inworld
The Text-to-Audio endpoint enables you to generate audio by providing a text input along with a valid audio URL or a pre-created voice using a voice_id. The output is an audio file that mimics the sound of the provided audio URL or the selected voice.
ModelsLab
Create stunning cinematic videos from text or images in minutes with a powerful model. Enjoy advanced motion control, 24fps output, and smooth, artifact-free visuals—perfect for filmmakers, creators, and marketers.
Eleven Labs
Generate up to 30 seconds of professional, royalty-free sound effects from text prompts with customizable duration, looping, and multiple MP3 output formats at 44.1 kHz.
Eleven Labs
Transform one voice into another in using advanced speech-to-speech technology. Perfect for dubbing, content creation, and voice customization without altering the original message.
ModelsLab
This endpoint allows you to change the environment scenario to check how house will look in different scenario.
Convert static images into dynamic videos with 4K resolution, realistic motion, and cinematic effects, ideal for creators seeking high-quality video content.
ModelsLab
This endpoint transforms damaged or unattractive exteriors into beautifully restored, visually appealing versions using AI
ModelsLab
Generate a rendered image of a floor plan for a room based on the provided input as well as interior
ModelsLab
Transform your space instantly with advanced AI-powered room decorator—upload any room photo, restyle in 50+ design aesthetics, preview realistic 3D renders, and virtually stage with lifelike furniture—no special hardware, cloud-based
ModelsLab
This endpoint transforms exterior house sketches into realistic photographs based on your prompt as well as interior
Veo 3 by Google is a cutting-edge AI video generation model that creates cinematic, high-quality videos from text or image prompts. With support for dynamic camera movements, detailed storytelling, and resolutions up to 1080p, it’s perfect for creators an
Eleven Labs
The Text-to-Audio endpoint enables you to generate audio by providing a text input along with a valid audio URL or a pre-created voice using a voice_id. The output is an audio file that mimics the sound of the provided audio URL or the selected voice.
Bytedance
Transform your ideas into stunning videos with Seedance AI video generation. Powered by ByteDance's advanced Seedance 1.0 Pro model, generate high-quality videos from image with cinematic camera movements, multi-shot storytelling capability
Bytedance
Transform your ideas into stunning videos with Seedance AI video generation. Powered by ByteDance's advanced Seedance 1.0 Pro model, generate high-quality videos from text prompts with cinematic camera movements, multi-shot storytelling capability.
ModelsLab
Transform interiors with ultra-realistic images, up to 2048x2048 resolution, and detailed text integration, ideal for designing and visualizing spaces with precision.
Bytedance
Generate high-resolution, ultra-detailed images up to 4K (4096×4096) from text in seconds, with advanced text rendering, multi-reference editing, batch output, and flexible styles—ideal for designers, marketers, and digital artists.
Sync.so
Achieve flawless lip sync and facial detail in live-action, 3D, and AI videos up to 4K. This advanced model uses diffusion-based super-resolution for realistic results—perfect for dubbing, dialogue replacement, and re-animation.
ModelsLab
FLUX Kontext DEV is an in-context image generation API that lets you create, edit, and transform images using text with high consistency.
Black Forest Labs
FLUX.1 Kontext [pro] is a model designed for advanced Image Editing. Unlike other models, you don’t need to create complex workflows to achieve this - Flux.1 Kontext [pro] handles it just by writing prompt
ModelsLab
The endpoint enables automatic voice translation of videos from one language to another. It accepts a video file link and various parameters to control the dubbing process.
ModelsLab
The SFX endpoint allows you to generate sound effects (SFX) from text prompts. It takes user input in the form of a text prompt to conditionally generate audio effects.
ModelsLab
Speech-to-Text transforms audio into written transcription, allowing spoken language to be converted into text for various applications.
ModelsLab
Generate original, genre-specific song lyrics instantly using advanced NLP and machine learning—customize by theme, mood, or language, perfect for musicians and content creators seeking fresh, copyright-free lyrics.
ModelsLab
Generate high-quality songs in 50+ languages by providing lyrics and reference audio using the ACE-Step v1.5 model, which accurately matches voice, melody, tone, and emotion for professional results.
ModelsLab
Advanced audio isolation technology removes background noise, delivering clear vocals for professional audio applications.
ModelsLab
The Music Generation API allows you to generate music based on textual prompts and optional conditioning melodies.
ModelsLab
The Voice Cover endpoint allows you to transform a song or audio file into a celeb/fictional character/singer/politician voice using a proper model id of that character.
ModelsLab
The Text-to-Audio endpoint generates audio from text using either a provided audio URL or a voice_id, producing output that mimics the selected voice
Ultra-realistic 4K text-to-video generator with cinematic motion, style control, and up to 8-second clips, perfect for ads, social, and creative storytelling.
Generate ultra-realistic, high-resolution (1024×1024 px, upscalable) images from text with advanced lighting, detail, and style control—ideal for photorealistic art, design, and marketing visuals.
Runway ML
Fast, cost-effective video generation model delivering 10-second cinematic clips in 30 seconds with consistent characters, realistic motion, and multi-aspect ratio support.
Runway ML
Generate high-resolution images with precise stylistic control, up to 1080p, and versatile aspect ratios, ideal for creating consistent visuals in various styles.
ModelsLab
QR Code Generator transforms plain QR codes into visually appealing, image-based designs while keeping them fully scannable
ModelsLab
Ghibli Art Style API transforms your images into dreamy, hand-drawn visuals inspired by Studio Ghibli’s iconic art style.
ModelsLab
Flux Text-to-Image is a multilingual AI that transforms text prompts into high-quality images in styles like photorealism, sketches, paintings, 3D renders, and abstract art
ModelsLab
ControlNet lets you control image generation using inputs like edges (Canny), depth maps (Depth), human poses (OpenPose), straight lines (MLSD), sketches (Lineart), and even functional QR codes (QRCode) to guide and shape the final output with precision
ModelsLab
Image to Image API generates variations of an input image, turns sketches into realistic images, and can blend two images to create a new output.
Text to Image -The Imagen 4 API lets you create high quality images in seconds, using text prompt to guide the generation. Note: Maximum prompt length is 480 tokens.
ModelsLab
Change your voice to sound like someone else—same words, different speaker. Just upload your voice and a target voice.
ModelsLab
The Inpainting API modifies specific parts of an image based on prompts—just send the image, mask, and prompt to the endpoint in one request.
ModelsLab
Generate high-quality videos from images with support for up to 4K resolution and cinematic motion, ideal for social media and branding content.
ModelsLab
Transform 2D images into high-fidelity 3D models instantly using advanced AI—supports photogrammetry, depth mapping, and exports in GLB/OBJ formats for gaming, AR/VR, and design.
ModelsLab
Generate high-quality images in just 2–4 seconds. From realistic and 3D art to fantasy our realtime model brings your ideas to life instantly.
ModelsLab
Transform any text description into detailed, editable 3D models instantly—no CAD skills needed—using advanced generative AI, semantic scene parsing, and multi-view consistent geometry for gaming, design, and rapid prototyping.
ModelsLab
Face Gen is an AI avatar generator that creates images based on your prompt while maintaining a consistent character using your face, in styles like realistic, anime, 3D, chibi, and comic.
ModelsLab
Seamlessly remove unwanted objects from photos with advanced AI-powered detection, content-aware fill, shadow reconstruction, and high-resolution support for flawless, natural edits.
ModelsLab
The AI Virtual Try-On API lets users digitally try upper wear, lower wear, and full outfits on their photos in seconds.
ModelsLab
Seamlessly expand images with intelligent edge-blending, supporting various aspect ratios and maintaining original detail.
ModelsLab
Transform text into ultra-realistic HD video instantly, with advanced generative AI, smooth animations, and multi-format export—ideal for marketing, social, and creative projects.
ModelsLab
The Text-to-Audio endpoint enables you to generate audio by providing a text input along with a valid audio URL or a pre-created voice using a voice_id. The output is an audio file that mimics the sound of the provided audio URL or the selected voice.
ModelsLab
Generate ultra-realistic headshots instantly with advanced image generation and facial optimization, supporting resolutions up to 1024x1024.
ModelsLab
Upscales images up to 8x resolution using AI-driven super resolution to enhance detail, remove blur, and preserve sharpness for printing and digital use. Supports PNG, JPEG, WebP, and HEIC formats with fast processing and batch capabilities.
ModelsLab
Make a POST request to https://modelslab.com/api/v6/image_editing/removebg_mask endpoint. Background Remover is an API that automatically removes the background from any image, making it clean and ready for use in any context.










































































































































