LiteLLM's unified API router now supports text-to-speech (TTS) providers, and ModelsLab's Voice API is a clean fit. If you're already routing LLM calls through LiteLLM, you can add voice synthesis to the same stack without a separate SDK.
This guide walks through the complete setup: provider configuration, API call structure, multi-voice routing, and how ModelsLab's voice models compare on latency and quality for production use.
Why Route TTS Through LiteLLM?
LiteLLM's value for TTS is the same as for LLMs: a single API call format that works across providers, plus fallback routing, cost tracking, and observability via Langfuse or other tools. If your app already uses LiteLLM for GPT-4o or Claude, adding voice synthesis through the same proxy means unified logging, rate limit handling, and billing visibility.
For teams running production AI apps, having TTS in the same observability stack as your LLM calls makes debugging faster. You can trace a full request chain from input text to voice output in one place.
Prerequisites
- LiteLLM v1.35.0+ (TTS support added in this version)
- ModelsLab API key — get one at modelslab.com/api
- Python 3.10+ or Node.js 18+
Step 1: Add ModelsLab as a TTS Provider in LiteLLM Config
In your litellm_config.yaml, add the ModelsLab TTS provider under the model_list:
model_list:
- model_name: modelslab-tts
litellm_params:
model: modelslab/text_to_audio
api_key: os.environ/MODELSLAB_API_KEY
api_base: https://modelslab.com/api/v6/voice
- model_name: modelslab-voice-clone
litellm_params:
model: modelslab/voice_clone
api_key: os.environ/MODELSLAB_API_KEY
api_base: https://modelslab.com/api/v6/voice
Set your environment variable:
export MODELSLAB_API_KEY="your_key_here"
Start the proxy:
litellm --config litellm_config.yaml --port 4000
Step 2: Make a TTS Request via LiteLLM
Once the proxy is running, call the OpenAI-compatible TTS endpoint:
from openai import OpenAI
client = OpenAI(
api_key="your-litellm-virtual-key",
base_url="http://localhost:4000"
)
response = client.audio.speech.create(
model="modelslab-tts",
voice="alloy",
input="Hello, this is a test of ModelsLab TTS via LiteLLM.",
response_format="mp3"
)
response.stream_to_file("output.mp3")
print("Audio saved to output.mp3")
The call goes through LiteLLM's proxy, gets routed to ModelsLab's Voice API, and returns an MP3 stream.
Step 3: Direct ModelsLab Voice API (Without Proxy)
If you prefer calling ModelsLab's Voice API directly:
import requests
import time
API_KEY = "your_modelslab_api_key"
def text_to_speech(text, voice_id="en-US-Neural2-A"):
response = requests.post(
"https://modelslab.com/api/v6/voice/text_to_audio",
headers={"Content-Type": "application/json"},
json={
"key": API_KEY,
"prompt": text,
"language": "en",
"speaker": voice_id,
"output_format": "mp3",
"speed": 1.0,
"webhook": None,
"track_id": None
}
)
result = response.json()
if result.get("status") == "processing":
return poll_for_audio(result.get("id"))
elif result.get("status") == "success":
return result.get("output", [None])[0]
else:
raise Exception(f"TTS failed: {result}")
def poll_for_audio(fetch_id, max_attempts=30):
for attempt in range(max_attempts):
time.sleep(3)
resp = requests.post(
"https://modelslab.com/api/v6/voice/fetch",
headers={"Content-Type": "application/json"},
json={"key": API_KEY, "request_id": str(fetch_id)}
)
result = resp.json()
if result.get("status") == "success":
return result.get("output", [None])[0]
raise TimeoutError("Audio generation timed out")
audio_url = text_to_speech("Welcome to ModelsLab Voice API.")
print(f"Audio URL: {audio_url}")
Multi-Voice Configuration in LiteLLM
LiteLLM lets you define multiple ModelsLab voice endpoints and route between them:
model_list:
- model_name: modelslab-narration
litellm_params:
model: modelslab/text_to_audio
api_key: os.environ/MODELSLAB_API_KEY
api_base: https://modelslab.com/api/v6/voice
- model_name: modelslab-branded-voice
litellm_params:
model: modelslab/voice_clone
api_key: os.environ/MODELSLAB_API_KEY
api_base: https://modelslab.com/api/v6/voice
router_settings:
routing_strategy: cost-based-routing
model_group_alias:
tts-default: modelslab-narration
tts-branded: modelslab-branded-voice
Your application code stays clean:
response = client.audio.speech.create(
model="tts-default",
input="This will use the narration voice.",
voice="alloy"
)
Cost Tracking and Observability
LiteLLM logs TTS requests alongside LLM calls in its spend tracking dashboard:
curl http://localhost:4000/spend/logs?model=modelslab-tts
For Langfuse integration, add to your config:
litellm_settings:
success_callback: ["langfuse"]
failure_callback: ["langfuse"]
environment_variables:
LANGFUSE_PUBLIC_KEY: "pk-lf-..."
LANGFUSE_SECRET_KEY: "sk-lf-..."
Every TTS call gets traced with input text length, latency, cost, and output URL.
ModelsLab Voice Models Available
- text_to_audio — Standard TTS with 20+ voices across 12 languages
- voice_clone — Clone any voice from a 10-second audio sample
- text_to_audio_with_sound — TTS with optional background audio mixing
All three are accessible via the same /api/v6/voice base URL with different endpoint suffixes.
Performance and Pricing
ModelsLab's Voice API processes most requests in 3-8 seconds, with sub-2-second latency for short text inputs under 100 characters. Pricing is per-character, making it competitive for production workloads that need volume.
Compared to ElevenLabs ($0.30/1K characters) and OpenAI TTS ($0.015/1K characters), ModelsLab's voice pricing sits in the mid-range with the advantage of being part of a unified multi-modal API that also covers image and video generation.
Next Steps
Full API documentation: docs.modelslab.com/voice/text-to-audio
Get your ModelsLab API key: modelslab.com/api
For RealtimeTTS integration (streaming voice synthesis for real-time applications), see the RealtimeTTS GitHub repository — ModelsLab support was added in PR #365.
