Create & Edit Images Instantly with Google Nano Banana 2

Try Nano Banana 2 Now
Skip to main content

LiteLLM Text-to-Speech: Using ModelsLab as Your TTS Provider

Adhik JoshiAdhik Joshi
||5 min read|API
LiteLLM Text-to-Speech: Using ModelsLab as Your TTS Provider

Integrate AI APIs Today

Build next-generation applications with ModelsLab's enterprise-grade AI APIs for image, video, audio, and chat generation

Get Started
Get Started

LiteLLM's unified API router now supports text-to-speech (TTS) providers, and ModelsLab's Voice API is a clean fit. If you're already routing LLM calls through LiteLLM, you can add voice synthesis to the same stack without a separate SDK.

This guide walks through the complete setup: provider configuration, API call structure, multi-voice routing, and how ModelsLab's voice models compare on latency and quality for production use.

Why Route TTS Through LiteLLM?

LiteLLM's value for TTS is the same as for LLMs: a single API call format that works across providers, plus fallback routing, cost tracking, and observability via Langfuse or other tools. If your app already uses LiteLLM for GPT-4o or Claude, adding voice synthesis through the same proxy means unified logging, rate limit handling, and billing visibility.

For teams running production AI apps, having TTS in the same observability stack as your LLM calls makes debugging faster. You can trace a full request chain from input text to voice output in one place.

Prerequisites

Step 1: Add ModelsLab as a TTS Provider in LiteLLM Config

In your litellm_config.yaml, add the ModelsLab TTS provider under the model_list:

model_list:
  - model_name: modelslab-tts
    litellm_params:
      model: modelslab/text_to_audio
      api_key: os.environ/MODELSLAB_API_KEY
      api_base: https://modelslab.com/api/v6/voice

  - model_name: modelslab-voice-clone
    litellm_params:
      model: modelslab/voice_clone
      api_key: os.environ/MODELSLAB_API_KEY
      api_base: https://modelslab.com/api/v6/voice

Set your environment variable:

export MODELSLAB_API_KEY="your_key_here"

Start the proxy:

litellm --config litellm_config.yaml --port 4000

Step 2: Make a TTS Request via LiteLLM

Once the proxy is running, call the OpenAI-compatible TTS endpoint:

from openai import OpenAI

client = OpenAI(
    api_key="your-litellm-virtual-key",
    base_url="http://localhost:4000"
)

response = client.audio.speech.create(
    model="modelslab-tts",
    voice="alloy",
    input="Hello, this is a test of ModelsLab TTS via LiteLLM.",
    response_format="mp3"
)

response.stream_to_file("output.mp3")
print("Audio saved to output.mp3")

The call goes through LiteLLM's proxy, gets routed to ModelsLab's Voice API, and returns an MP3 stream.

Step 3: Direct ModelsLab Voice API (Without Proxy)

If you prefer calling ModelsLab's Voice API directly:

import requests
import time

API_KEY = "your_modelslab_api_key"

def text_to_speech(text, voice_id="en-US-Neural2-A"):
    response = requests.post(
        "https://modelslab.com/api/v6/voice/text_to_audio",
        headers={"Content-Type": "application/json"},
        json={
            "key": API_KEY,
            "prompt": text,
            "language": "en",
            "speaker": voice_id,
            "output_format": "mp3",
            "speed": 1.0,
            "webhook": None,
            "track_id": None
        }
    )
    result = response.json()
    if result.get("status") == "processing":
        return poll_for_audio(result.get("id"))
    elif result.get("status") == "success":
        return result.get("output", [None])[0]
    else:
        raise Exception(f"TTS failed: {result}")

def poll_for_audio(fetch_id, max_attempts=30):
    for attempt in range(max_attempts):
        time.sleep(3)
        resp = requests.post(
            "https://modelslab.com/api/v6/voice/fetch",
            headers={"Content-Type": "application/json"},
            json={"key": API_KEY, "request_id": str(fetch_id)}
        )
        result = resp.json()
        if result.get("status") == "success":
            return result.get("output", [None])[0]
    raise TimeoutError("Audio generation timed out")

audio_url = text_to_speech("Welcome to ModelsLab Voice API.")
print(f"Audio URL: {audio_url}")

Multi-Voice Configuration in LiteLLM

LiteLLM lets you define multiple ModelsLab voice endpoints and route between them:

model_list:
  - model_name: modelslab-narration
    litellm_params:
      model: modelslab/text_to_audio
      api_key: os.environ/MODELSLAB_API_KEY
      api_base: https://modelslab.com/api/v6/voice

  - model_name: modelslab-branded-voice
    litellm_params:
      model: modelslab/voice_clone
      api_key: os.environ/MODELSLAB_API_KEY
      api_base: https://modelslab.com/api/v6/voice

router_settings:
  routing_strategy: cost-based-routing
  model_group_alias:
    tts-default: modelslab-narration
    tts-branded: modelslab-branded-voice

Your application code stays clean:

response = client.audio.speech.create(
    model="tts-default",
    input="This will use the narration voice.",
    voice="alloy"
)

Cost Tracking and Observability

LiteLLM logs TTS requests alongside LLM calls in its spend tracking dashboard:

curl http://localhost:4000/spend/logs?model=modelslab-tts

For Langfuse integration, add to your config:

litellm_settings:
  success_callback: ["langfuse"]
  failure_callback: ["langfuse"]

environment_variables:
  LANGFUSE_PUBLIC_KEY: "pk-lf-..."
  LANGFUSE_SECRET_KEY: "sk-lf-..."

Every TTS call gets traced with input text length, latency, cost, and output URL.

ModelsLab Voice Models Available

  • text_to_audio — Standard TTS with 20+ voices across 12 languages
  • voice_clone — Clone any voice from a 10-second audio sample
  • text_to_audio_with_sound — TTS with optional background audio mixing

All three are accessible via the same /api/v6/voice base URL with different endpoint suffixes.

Performance and Pricing

ModelsLab's Voice API processes most requests in 3-8 seconds, with sub-2-second latency for short text inputs under 100 characters. Pricing is per-character, making it competitive for production workloads that need volume.

Compared to ElevenLabs ($0.30/1K characters) and OpenAI TTS ($0.015/1K characters), ModelsLab's voice pricing sits in the mid-range with the advantage of being part of a unified multi-modal API that also covers image and video generation.

Next Steps

Full API documentation: docs.modelslab.com/voice/text-to-audio

Get your ModelsLab API key: modelslab.com/api

For RealtimeTTS integration (streaming voice synthesis for real-time applications), see the RealtimeTTS GitHub repository — ModelsLab support was added in PR #365.

Share:
Adhik Joshi

Written by

Adhik Joshi

Plugins

Explore Plugins for Pro

Our plugins are designed to work with the most popular content creation software.

API

Build Apps with
ML
API

Use our API to build apps, generate AI art, create videos, and produce audio with ease.