Happy Horse 1.0 is now on ModelsLab

Try Now
Skip to main content
Available now on ModelsLab · Voice & Audio

Eleven Multilingual v2 API — Multilingual Speech GenerationEleven Multilingual v2 TTS in 30+ languages via REST API. Pay per character.

Sample output

Why teams ship with Eleven Multilingual v2

Multilingual v2

30+ languages from one model

Eleven Multilingual v2 produces natural speech in 30+ languages including English, Spanish, French, German, Italian, Portuguese, Polish, Mandarin, Japanese, Korean, Hindi, and Arabic. One model handles them all.

Voice library

Pre-built voices for instant use

Choose from a library of pre-built voices spanning male, female, neutral, and character profiles. Each voice supports all 30+ languages without retraining.

Voice cloning compatible

Use your cloned voices

Upload a voice sample to the voice cloning API, then synthesize multilingual speech with your custom voice. Same voice, every language.

Streaming output

Low-latency audio streaming

Stream generated audio chunk-by-chunk for real-time conversational applications. Latency under 400ms to first audio chunk on dedicated infrastructure.

Output formats

MP3, WAV, PCM, and Opus

Choose the output format that fits your pipeline: MP3 for web playback, WAV for editing, PCM for low-level processing, Opus for streaming applications.

Predictable pricing

Pay per character generated

Per-character pricing — no per-minute or per-month surprises. Run the math: a 5-minute audiobook chapter typically costs $0.30–$0.50 to generate.

No vendor lock-in

Same key for image, video, LLM

ModelsLab gives you a single API key across modalities. Use Eleven Multilingual v2 alongside the image, video, and LLM APIs without juggling vendor accounts.

Compliance

GDPR-ready, DPA available

Generated audio and source text are processed in compliant regions and removed after delivery. Signed DPAs and dedicated VPC deployments available for enterprise.

Examples

Eleven Multilingual v2 use cases

Copy any prompt below and try it yourself in the playground.

Tech Demo

Explain quantum computing basics in a clear, enthusiastic tone for a tech conference audience.

Product Pitch

Describe a new electric vehicle model, highlighting speed, range, and eco features with confident delivery.

Nature Narration

Narrate a serene forest walk, capturing bird calls and wind sounds in a calm, immersive voice.

City Guide

Guide tourists through urban architecture, pointing out historical landmarks with an engaging local accent.

For Developers

A few lines of code.
Multilingual speech in one POST

ModelsLab handles the infrastructure: fast inference, auto-scaling, and a developer-friendly API. No GPU management needed.

  • Serverless: scales to zero, scales to millions
  • Pay per second, no minimums
  • Python and JavaScript SDKs, plus REST API
import requests
response = requests.post(
"https://modelslab.com/api/v7/voice/text-to-speech",
json={
"key": "YOUR_API_KEY",
"prompt": "Hey, love. I just wanted to say… you're doing beautifully. Even if today felt a little messy, even if you didn’t get everything done that’s okay. You’re still growing, still trying, still shining. I see your heart, your effort, your gentleness. And I just hope you can feel how much you're loved. So rest easy now. You’re safe, you’re enough, and I’m proud of you more than words can say.",
"voice_id": "M7baJQBjzMsrxxZ796H6"
}
)
print(response.json())

FAQ

Common questions about Eleven Multilingual v2 API — Multilingual Speech Generation

Read the docs

Eleven Multilingual v2 is a text-to-speech model that produces broadcast-quality speech in 30+ languages from a single API call. ModelsLab exposes the model via a REST endpoint with pay-per-character pricing — no ElevenLabs subscription required.

30+ languages including English, Spanish, French, German, Italian, Portuguese, Polish, Mandarin, Japanese, Korean, Hindi, Arabic, Turkish, Dutch, Czech, Russian, Indonesian, Malay, Filipino, Bulgarian, Romanian, Ukrainian, Greek, Vietnamese, and more. Pass a language parameter or let the model auto-detect from text.

The model is the same — Eleven Multilingual v2. The difference is pricing and integration: ModelsLab charges per character with no subscription, exposes the same model via the same call, and bundles it with image, video, and LLM APIs on a single API key.

Yes. Use the ModelsLab voice cloning API to create a custom voice from a 10-second sample, then synthesize multilingual speech with that voice. The same voice works across all 30+ supported languages without retraining.

Yes. Set stream=true in the request to receive audio chunks via server-sent events. Latency to the first audio chunk is typically under 400ms, suitable for real-time conversational apps.

MP3 (default, web-friendly), WAV (lossless, editing-ready), PCM (raw audio for processing pipelines), and Opus (low-latency streaming). Specify with the output_format parameter.

Pricing is per character, starting at $0.0002 per character. A 1-minute audiobook chapter (~150 words, ~750 characters) costs approximately $0.15. No monthly minimum, no subscription.

For non-streaming, full audio for 100 characters of text generates in 1–2 seconds. For streaming, time-to-first-audio is under 400ms. Latency is consistent across requests — no cold starts.

Default limits are 60 requests per minute, scaling automatically with paid usage. Enterprise plans include higher limits and dedicated capacity. Contact sales for custom rate terms.

Yes. Source text and generated audio are processed in compliant regions and removed from infrastructure after delivery. Signed DPAs and dedicated VPC deployments available for enterprise customers.

Ready to create?

Start generating with Eleven Multilingual v2 API — Multilingual Speech Generation on ModelsLab.