How long does a voice sample need to be for cloning?

ModelsLab voice cloning requires a minimum of 10 seconds of clear speech. For best results, provide 20-30 seconds of natural conversation with minimal background noise. WAV and MP3 formats are supported.

How many languages does the voice cloning API support?

ModelsLab voice cloning API supports 50+ languages for speech generation. The cloned voice maintains its characteristics across languages, allowing you to generate natural-sounding speech in English, Spanish, French, German, Japanese, and more from a single voice sample.

Can I use cloned voices in commercial products?

Yes. ModelsLab voice cloning API can be used in commercial applications. Ensure you have appropriate consent from the voice owner. ModelsLab provides usage rights for voices generated through the API for commercial use.

What audio format does the API return?

The voice cloning API returns generated audio as publicly accessible URLs (WAV format) that expire after 24 hours. Download and store in your own storage for permanent access. Base64 output is also available for direct embedding.

How does ModelsLab voice cloning compare to ElevenLabs?

ModelsLab requires shorter samples (10s vs 30s), supports more languages (50+ vs 29), and offers a more generous free tier (100 calls/day vs 10k chars/month). ModelsLab also provides image, video, and LLM APIs through the same key. ElevenLabs has a more mature voice library.

Is the voice cloning API suitable for real-time applications?

Yes. ModelsLab voice cloning API supports real-time streaming for low-latency speech generation. Short text segments can be generated and streamed in near-real-time, suitable for conversational AI and interactive voice applications.

How do I handle errors and retries in production?

Implement exponential backoff with 3-5 retries. Check for HTTP 429 (rate limit) and respect retry-after headers. Use webhooks for async processing to avoid timeouts. Monitor with the ModelsLab dashboard for usage analytics and error rates.

Audio Gen

Voice Cloning API Developer Guide

Name: Voice Cloning API Developer Guide
Brand: ModelsLab
Rating: 4.6 (7 reviews)

Complete developer guide for integrating voice cloning into your application. Clone voices from 10-second samples, generate multilingual speech, and build voice-powered features.

Get Voice Cloning API Key API Documentation

Voice Cloning API: The Complete Developer Guide

What is Voice Cloning API Integration?

A voice cloning API lets developers programmatically replicate a voice from a short audio sample and use it to generate speech from any text. ModelsLab voice cloning API requires as little as 10 seconds of audio to create a reusable voice profile that produces natural, expressive speech in 50+ languages.

This developer guide walks through the complete integration process: uploading voice samples, creating voice profiles, generating speech, handling async processing, error handling, and production best practices.

Prerequisites and Setup

Before you start integrating the voice cloning API:

ModelsLab account with API key — Sign up free at modelslab.com, no credit card required
Audio sample — 10-30 seconds of clear speech, WAV or MP3 format, minimal background noise
HTTP client — Python requests, Node.js fetch, or any REST-capable language
Storage — Somewhere to store generated audio files (S3, GCS, or local filesystem)
Webhook endpoint (optional) — For async processing notifications in production

API Architecture Overview

The ModelsLab voice cloning API follows a two-phase workflow:

Phase 1: Voice Profile Creation — Upload an audio sample to create a reusable voice profile. This returns a voice_id you will use for all future generation requests.
Phase 2: Speech Generation — Send text plus the voice_id to generate speech. The API returns audio URLs or base64 data. Supports sync and async modes.
Both phases use standard REST endpoints with JSON payloads. Authentication is via API key in the request body.

Voice Cloning API Code Examples

From voice sample upload to speech generation — production-ready code.

Step 1: Upload voice sample and create profile (Python)

Python

1import requests
2
3# Upload a voice sample to create a cloned voice profile
4url = "https://modelslab.com/api/v6/voice/create_voice"
5payload = {
6    "key": "YOUR_API_KEY",
7    "voice_name": "customer-voice-001",
8    "init_audio": "https://your-storage.com/voice-sample.wav",
9    "language": "en"
10}
11
12response = requests.post(url, json=payload)
13data = response.json()
14
15# Save the voice_id for later use
16voice_id = data["voice_id"]
17print(f"Voice profile created: {voice_id}")

Step 2: Generate speech with cloned voice (Python)

Python

1# Generate speech using the cloned voice
2url = "https://modelslab.com/api/v6/voice/text_to_speech"
3payload = {
4    "key": "YOUR_API_KEY",
5    "voice_id": voice_id,  # From step 1
6    "text": "Welcome to our platform. We are glad to have you here.",
7    "language": "en",
8    "speed": 1.0,
9    "pitch": 1.0
10}
11
12response = requests.post(url, json=payload)
13data = response.json()
14
15# Download the generated audio
16audio_url = data["output"][0]
17print(f"Generated audio: {audio_url}")

Full integration with error handling (JavaScript)

JavaScript

1async function cloneVoiceAndSpeak(sampleUrl, text) {
2  // Step 1: Create voice profile
3  const createRes = await fetch('https://modelslab.com/api/v6/voice/create_voice', {
4    method: 'POST',
5    headers: { 'Content-Type': 'application/json' },
6    body: JSON.stringify({
7      key: 'YOUR_API_KEY',
8      voice_name: `voice-${Date.now()}`,
9      init_audio: sampleUrl,
10      language: 'en'
11    })
12  });
13
14  const createData = await createRes.json();
15  if (createData.status === 'error') throw new Error(createData.message);
16
17  // Step 2: Generate speech
18  const speechRes = await fetch('https://modelslab.com/api/v6/voice/text_to_speech', {
19    method: 'POST',
20    headers: { 'Content-Type': 'application/json' },
21    body: JSON.stringify({
22      key: 'YOUR_API_KEY',
23      voice_id: createData.voice_id,
24      text: text,
25      language: 'en'
26    })
27  });
28
29  const speechData = await speechRes.json();
30  return speechData.output[0]; // Audio URL
31}
32
33// Usage
34const audioUrl = await cloneVoiceAndSpeak(
35  'https://storage.example.com/sample.wav',
36  'This is generated speech using a cloned voice.'
37);
38console.log(`Audio: ${audioUrl}`);

Multilingual voice generation

Python

1# Generate the same cloned voice in multiple languages
2languages = ["en", "es", "fr", "de", "ja"]
3texts = {
4    "en": "Hello, welcome to our service.",
5    "es": "Hola, bienvenido a nuestro servicio.",
6    "fr": "Bonjour, bienvenue dans notre service.",
7    "de": "Hallo, willkommen bei unserem Service.",
8    "ja": "こんにちは、サービスへようこそ。"
9}
10
11for lang in languages:
12    payload = {
13        "key": "YOUR_API_KEY",
14        "voice_id": voice_id,
15        "text": texts[lang],
16        "language": lang
17    }
18    response = requests.post("https://modelslab.com/api/v6/voice/text_to_speech", json=payload)
19    data = response.json()
20    print(f"{lang}: {data['output'][0]}")

Integration Workflow

Build voice cloning into your app in three steps.

STEP 01

Step 1: Create a Voice Profile

Upload a 10-30 second audio sample of clear speech. The API processes the sample and returns a voice_id — a reusable identifier for all future speech generation with that voice.

STEP 02

Step 2: Generate Speech

Send any text along with the voice_id to the text-to-speech endpoint. Receive generated audio as a URL or base64 data. Supports speed, pitch, and language controls.

STEP 03

Step 3: Production Integration

Use webhooks for async processing, cache voice profiles, implement error handling and retries, and add multilingual support. Scale to thousands of voice generations per day.

Start Building

Voice Cloning API Providers Compared

How ModelsLab voice cloning compares to ElevenLabs and other providers.

Feature	ModelsLab	ElevenLabs	Play.ht	Resemble AI
Min Sample Length	10 seconds	30 seconds	30 seconds	1 minute
Languages Supported	50+	29	30+	24
Starting Price	Pay-as-you-go	$5/mo (starter)	$39/mo	$24/mo
Free Tier	100 calls/day	10k chars/mo	Trial only	Trial only
Emotional Control	Yes	Yes	Limited	Yes
Real-Time Streaming	Yes	Yes	Yes	Limited
Image + Video APIs Too	Same key	No	No	No
Webhook Support	Yes	Yes	No	Yes

Data as of April 2026. Based on publicly available documentation.

Production Best Practices

When deploying voice cloning in production applications:

Cache voice profiles — Create voice profiles once and reuse the voice_id. Do not re-upload samples for each generation.
Use webhooks for async — Long-form speech generation can take 5-10 seconds. Use webhooks instead of polling.
Handle errors gracefully — Implement retry logic with exponential backoff. Check for rate limit headers.
Validate audio samples — Ensure samples have minimal background noise and clear speech for best clone quality.
Store generated audio — Audio URLs expire after 24 hours. Download and store in your own storage (S3, GCS).
Monitor usage — Track API calls and generation quality. Use ModelsLab dashboard for usage analytics.

Authentication and Rate Limits

The ModelsLab voice cloning API uses API key authentication passed in the request body. Rate limits depend on your plan: free tier allows 100 calls/day, paid plans scale to thousands of concurrent requests. The API returns standard HTTP status codes with retry-after headers for rate limiting.

For enterprise workloads, dedicated instances provide guaranteed throughput and custom rate limits. Contact sales for SLA-backed voice cloning infrastructure.