Seedance 2.0 is here - create consistent, multimodal AI videos faster with images, videos, and audio in one prompt.

Try Now
Skip to main content

LiteLLM Text-to-Speech: Using ModelsLab as Your TTS Provider 2026

||5 min read|API
LiteLLM Text-to-Speech: Using ModelsLab as Your TTS Provider 2026

Start Building with ModelsLab APIs

One API key. 100,000+ models. Image, video, audio, and LLM generation.

99.9% UptimePay-as-you-goFree tier available
Get Started

LiteLLM's unified API router now supports text-to-speech (TTS) providers, and ModelsLab's Voice API is a clean fit. If you're already routing LLM calls through LiteLLM, you can add voice synthesis to the same stack without a separate SDK.

This guide walks through the complete setup: provider configuration, API call structure, multi-voice routing, and how ModelsLab's voice models compare on latency and quality for production use.

Why Route TTS Through LiteLLM?

LiteLLM's value for TTS is the same as for LLMs: a single API call format that works across providers, plus fallback routing, cost tracking, and observability via Langfuse or other tools. If your app already uses LiteLLM for GPT-4o or Claude, adding voice synthesis through the same proxy means unified logging, rate limit handling, and billing visibility.

For teams running production AI apps, having TTS in the same observability stack as your LLM calls makes debugging faster. You can trace a full request chain from input text to voice output in one place.

Prerequisites

Step 1: Add ModelsLab as a TTS Provider in LiteLLM Config

In your litellm_config.yaml, add the ModelsLab TTS provider under the model_list:

model_list:
- model_name: modelslab-tts
litellm_params:
model: modelslab/text_to_audio
api_key: os.environ/MODELSLAB_API_KEY
api_base: https://modelslab.com/api/v6/voice
model_name: modelslab-voice-clone
litellm_params:
model: modelslab/voice_clone
api_key: os.environ/MODELSLAB_API_KEY
api_base: https://modelslab.com/api/v6/voice

Set your environment variable:

export MODELSLAB_API_KEY="your_key_here"

Start the proxy:

litellm --config litellm_config.yaml --port 4000

Step 2: Make a TTS Request via LiteLLM

Once the proxy is running, call the OpenAI-compatible TTS endpoint:

from openai import OpenAI
client = OpenAI(
api_key="your-litellm-virtual-key",
base_url="http://localhost:4000"
),[object Object],
,[object Object],
response.stream_to_file("output.mp3")
print("Audio saved to output.mp3")

The call goes through LiteLLM's proxy, gets routed to ModelsLab's Voice API, and returns an MP3 stream.

Step 3: Direct ModelsLab Voice API (Without Proxy)

If you prefer calling ModelsLab's Voice API directly:

import requests
import time
API_KEY = "your_modelslab_api_key",[object Object],
,[object Object],
,[object Object],
audio_url = text_to_speech("Welcome to ModelsLab Voice API.")
print(f"Audio URL: {audio_url}")

Multi-Voice Configuration in LiteLLM

LiteLLM lets you define multiple ModelsLab voice endpoints and route between them:

model_list:,[object Object],
,[object Object],
,[object Object],
router_settings:
routing_strategy: cost-based-routing
model_group_alias:
tts-default: modelslab-narration
tts-branded: modelslab-branded-voice

Your application code stays clean:

response = client.audio.speech.create(
model="tts-default",
input="This will use the narration voice.",
voice="alloy"
)

Cost Tracking and Observability

LiteLLM logs TTS requests alongside LLM calls in its spend tracking dashboard:

curl http://localhost:4000/spend/logs?model=modelslab-tts

For Langfuse integration, add to your config:

litellm_settings:
success_callback: ["langfuse"]
failure_callback: ["langfuse"]
environment_variables:
LANGFUSE_PUBLIC_KEY: "pk-lf-..."
LANGFUSE_SECRET_KEY: "sk-lf-..."

Every TTS call gets traced with input text length, latency, cost, and output URL.

ModelsLab Voice Models Available

  • text_to_audio — Standard TTS with 20+ voices across 12 languages
  • voice_clone — Clone any voice from a 10-second audio sample
  • text_to_audio_with_sound — TTS with optional background audio mixing

All three are accessible via the same /api/v6/voice base URL with different endpoint suffixes.

Performance and Pricing

ModelsLab's Voice API processes most requests in 3-8 seconds, with sub-2-second latency for short text inputs under 100 characters. Pricing is per-character, making it competitive for production workloads that need volume.

Compared to ElevenLabs ($0.30/1K characters) and OpenAI TTS ($0.015/1K characters), ModelsLab's voice pricing sits in the mid-range with the advantage of being part of a unified multi-modal API that also covers image and video generation.

Next Steps

Full API documentation: docs.modelslab.com/voice/text-to-audio

Get your ModelsLab API key: modelslab.com/api

For RealtimeTTS integration (streaming voice synthesis for real-time applications), see the RealtimeTTS GitHub repository — ModelsLab support was added in PR #365.

Share:
Plugins

Explore Plugins for Pro

Our plugins are designed to work with the most popular content creation software.

API

Build Apps with
ML
API

Use our API to build apps, generate AI art, create videos, and produce audio with ease.