Seedance 2.0 is here - create consistent, multimodal AI videos faster with images, videos, and audio in one prompt.

Try Now
Skip to main content
Audio Gen

Voice Cloning API Developer Guide

Complete developer guide for integrating voice cloning into your application. Clone voices from 10-second samples, generate multilingual speech, and build voice-powered features.

Voice Cloning API: The Complete Developer Guide

What is Voice Cloning API Integration?

A voice cloning API lets developers programmatically replicate a voice from a short audio sample and use it to generate speech from any text. ModelsLab voice cloning API requires as little as 10 seconds of audio to create a reusable voice profile that produces natural, expressive speech in 50+ languages.

This developer guide walks through the complete integration process: uploading voice samples, creating voice profiles, generating speech, handling async processing, error handling, and production best practices.

Prerequisites and Setup

Before you start integrating the voice cloning API:

  • ModelsLab account with API key — Sign up free at modelslab.com, no credit card required
  • Audio sample — 10-30 seconds of clear speech, WAV or MP3 format, minimal background noise
  • HTTP client — Python requests, Node.js fetch, or any REST-capable language
  • Storage — Somewhere to store generated audio files (S3, GCS, or local filesystem)
  • Webhook endpoint (optional) — For async processing notifications in production

API Architecture Overview

The ModelsLab voice cloning API follows a two-phase workflow:

  • Phase 1: Voice Profile Creation — Upload an audio sample to create a reusable voice profile. This returns a voice_id you will use for all future generation requests.
  • Phase 2: Speech Generation — Send text plus the voice_id to generate speech. The API returns audio URLs or base64 data. Supports sync and async modes.
  • Both phases use standard REST endpoints with JSON payloads. Authentication is via API key in the request body.

Voice Cloning API Code Examples

From voice sample upload to speech generation — production-ready code.

Step 1: Upload voice sample and create profile (Python)

Python
1import requests
2
3# Upload a voice sample to create a cloned voice profile
4url = "https://modelslab.com/api/v6/voice/create_voice"
5payload = {
6 "key": "YOUR_API_KEY",
7 "voice_name": "customer-voice-001",
8 "init_audio": "https://your-storage.com/voice-sample.wav",
9 "language": "en"
10}
11
12response = requests.post(url, json=payload)
13data = response.json()
14
15# Save the voice_id for later use
16voice_id = data["voice_id"]
17print(f"Voice profile created: {voice_id}")

Step 2: Generate speech with cloned voice (Python)

Python
1# Generate speech using the cloned voice
2url = "https://modelslab.com/api/v6/voice/text_to_speech"
3payload = {
4 "key": "YOUR_API_KEY",
5 "voice_id": voice_id, # From step 1
6 "text": "Welcome to our platform. We are glad to have you here.",
7 "language": "en",
8 "speed": 1.0,
9 "pitch": 1.0
10}
11
12response = requests.post(url, json=payload)
13data = response.json()
14
15# Download the generated audio
16audio_url = data["output"][0]
17print(f"Generated audio: {audio_url}")

Full integration with error handling (JavaScript)

JavaScript
1async function cloneVoiceAndSpeak(sampleUrl, text) {
2 // Step 1: Create voice profile
3 const createRes = await fetch('https://modelslab.com/api/v6/voice/create_voice', {
4 method: 'POST',
5 headers: { 'Content-Type': 'application/json' },
6 body: JSON.stringify({
7 key: 'YOUR_API_KEY',
8 voice_name: `voice-${Date.now()}`,
9 init_audio: sampleUrl,
10 language: 'en'
11 })
12 });
13
14 const createData = await createRes.json();
15 if (createData.status === 'error') throw new Error(createData.message);
16
17 // Step 2: Generate speech
18 const speechRes = await fetch('https://modelslab.com/api/v6/voice/text_to_speech', {
19 method: 'POST',
20 headers: { 'Content-Type': 'application/json' },
21 body: JSON.stringify({
22 key: 'YOUR_API_KEY',
23 voice_id: createData.voice_id,
24 text: text,
25 language: 'en'
26 })
27 });
28
29 const speechData = await speechRes.json();
30 return speechData.output[0]; // Audio URL
31}
32
33// Usage
34const audioUrl = await cloneVoiceAndSpeak(
35 'https://storage.example.com/sample.wav',
36 'This is generated speech using a cloned voice.'
37);
38console.log(`Audio: ${audioUrl}`);

Multilingual voice generation

Python
1# Generate the same cloned voice in multiple languages
2languages = ["en", "es", "fr", "de", "ja"]
3texts = {
4 "en": "Hello, welcome to our service.",
5 "es": "Hola, bienvenido a nuestro servicio.",
6 "fr": "Bonjour, bienvenue dans notre service.",
7 "de": "Hallo, willkommen bei unserem Service.",
8 "ja": "こんにちは、サービスへようこそ。"
9}
10
11for lang in languages:
12 payload = {
13 "key": "YOUR_API_KEY",
14 "voice_id": voice_id,
15 "text": texts[lang],
16 "language": lang
17 }
18 response = requests.post("https://modelslab.com/api/v6/voice/text_to_speech", json=payload)
19 data = response.json()
20 print(f"{lang}: {data['output'][0]}")

Integration Workflow

Build voice cloning into your app in three steps.

STEP 01
STEP 01

Step 1: Create a Voice Profile

Upload a 10-30 second audio sample of clear speech. The API processes the sample and returns a voice_id — a reusable identifier for all future speech generation with that voice.

STEP 02
STEP 02

Step 2: Generate Speech

Send any text along with the voice_id to the text-to-speech endpoint. Receive generated audio as a URL or base64 data. Supports speed, pitch, and language controls.

STEP 03
STEP 03

Step 3: Production Integration

Use webhooks for async processing, cache voice profiles, implement error handling and retries, and add multilingual support. Scale to thousands of voice generations per day.

Voice Cloning API Providers Compared

How ModelsLab voice cloning compares to ElevenLabs and other providers.

FeatureModelsLabElevenLabsPlay.htResemble AI
Min Sample Length10 seconds30 seconds30 seconds1 minute
Languages Supported50+2930+24
Starting PricePay-as-you-go$5/mo (starter)$39/mo$24/mo
Free Tier100 calls/day10k chars/moTrial onlyTrial only
Emotional ControlYesYesLimitedYes
Real-Time StreamingYesYesYesLimited
Image + Video APIs TooSame keyNoNoNo
Webhook SupportYesYesNoYes

Data as of April 2026. Based on publicly available documentation.

Production Best Practices

When deploying voice cloning in production applications:

  • Cache voice profiles — Create voice profiles once and reuse the voice_id. Do not re-upload samples for each generation.
  • Use webhooks for async — Long-form speech generation can take 5-10 seconds. Use webhooks instead of polling.
  • Handle errors gracefully — Implement retry logic with exponential backoff. Check for rate limit headers.
  • Validate audio samples — Ensure samples have minimal background noise and clear speech for best clone quality.
  • Store generated audio — Audio URLs expire after 24 hours. Download and store in your own storage (S3, GCS).
  • Monitor usage — Track API calls and generation quality. Use ModelsLab dashboard for usage analytics.

Authentication and Rate Limits

The ModelsLab voice cloning API uses API key authentication passed in the request body. Rate limits depend on your plan: free tier allows 100 calls/day, paid plans scale to thousands of concurrent requests. The API returns standard HTTP status codes with retry-after headers for rate limiting.

For enterprise workloads, dedicated instances provide guaranteed throughput and custom rate limits. Contact sales for SLA-backed voice cloning infrastructure.

ModelsLab Voice Cloning API Features

Key advantages that set us apart

Clone any voice from a 10-second sample
50+ languages supported for multilingual generation
Emotional control for expressive speech
Real-time streaming for low-latency applications
Webhook callbacks for async processing
Free tier with 100 API calls per day
Same API key for voice + image + video + LLM
Python and JavaScript code examples
Production-ready error handling and retry logic
GDPR-compliant with configurable data retention
Enterprise SLA with dedicated instances
Audio output as URL or base64

Our Popular Use Cases

What developers build with the voice cloning API:

Generate podcast intros, audiobook narrations, and personalized voice messages using cloned voices. Scale audio content creation.

Personalized Audio Content

Voice Cloning API Developer FAQ

ModelsLab voice cloning requires a minimum of 10 seconds of clear speech. For best results, provide 20-30 seconds of natural conversation with minimal background noise. WAV and MP3 formats are supported.

ModelsLab voice cloning API supports 50+ languages for speech generation. The cloned voice maintains its characteristics across languages, allowing you to generate natural-sounding speech in English, Spanish, French, German, Japanese, and more from a single voice sample.

Yes. ModelsLab voice cloning API can be used in commercial applications. Ensure you have appropriate consent from the voice owner. ModelsLab provides usage rights for voices generated through the API for commercial use.

The voice cloning API returns generated audio as publicly accessible URLs (WAV format) that expire after 24 hours. Download and store in your own storage for permanent access. Base64 output is also available for direct embedding.

ModelsLab requires shorter samples (10s vs 30s), supports more languages (50+ vs 29), and offers a more generous free tier (100 calls/day vs 10k chars/month). ModelsLab also provides image, video, and LLM APIs through the same key. ElevenLabs has a more mature voice library.

Yes. ModelsLab voice cloning API supports real-time streaming for low-latency speech generation. Short text segments can be generated and streamed in near-real-time, suitable for conversational AI and interactive voice applications.

Implement exponential backoff with 3-5 retries. Check for HTTP 429 (rate limit) and respect retry-after headers. Use webhooks for async processing to avoid timeouts. Monitor with the ModelsLab dashboard for usage analytics and error rates.

Your Data is Secure: GDPR Compliant AI Services

ModelsLab GDPR Compliance Certification Badge

GDPR Compliant

Get Expert Support in Seconds

We're Here to Help.

Want to know more? You can email us anytime at support@modelslab.com

View Docs
Plugins

Explore Plugins for Pro

Our plugins are designed to work with the most popular content creation software.

API

Build Apps with
ML
API

Use our API to build apps, generate AI art, create videos, and produce audio with ease.