Question 1

What is the Eleven Multilingual v2 API?

Accepted Answer

Eleven Multilingual v2 is a text-to-speech model that produces broadcast-quality speech in 30+ languages from a single API call. ModelsLab exposes the model via a REST endpoint with pay-per-character pricing — no ElevenLabs subscription required.

Question 2

Which languages does Eleven Multilingual v2 support?

Accepted Answer

30+ languages including English, Spanish, French, German, Italian, Portuguese, Polish, Mandarin, Japanese, Korean, Hindi, Arabic, Turkish, Dutch, Czech, Russian, Indonesian, Malay, Filipino, Bulgarian, Romanian, Ukrainian, Greek, Vietnamese, and more. Pass a language parameter or let the model auto-detect from text.

Question 3

How is this different from the ElevenLabs API?

Accepted Answer

The model is the same — Eleven Multilingual v2. The difference is pricing and integration: ModelsLab charges per character with no subscription, exposes the same model via the same call, and bundles it with image, video, and LLM APIs on a single API key.

Question 4

Can I clone a voice and use it in multiple languages?

Accepted Answer

Yes. Use the ModelsLab voice cloning API to create a custom voice from a 10-second sample, then synthesize multilingual speech with that voice. The same voice works across all 30+ supported languages without retraining.

Question 5

Does the API support streaming output?

Accepted Answer

Yes. Set stream=true in the request to receive audio chunks via server-sent events. Latency to the first audio chunk is typically under 400ms, suitable for real-time conversational apps.

Question 6

What audio formats does the API output?

Accepted Answer

MP3 (default, web-friendly), WAV (lossless, editing-ready), PCM (raw audio for processing pipelines), and Opus (low-latency streaming). Specify with the output_format parameter.

Question 7

How much does Eleven Multilingual v2 cost?

Accepted Answer

Pricing is per character, starting at $0.0002 per character. A 1-minute audiobook chapter (~150 words, ~750 characters) costs approximately $0.15. No monthly minimum, no subscription.

Question 8

What latency should I expect?

Accepted Answer

For non-streaming, full audio for 100 characters of text generates in 1–2 seconds. For streaming, time-to-first-audio is under 400ms. Latency is consistent across requests — no cold starts.

Question 9

Are there usage rate limits?

Accepted Answer

Default limits are 60 requests per minute, scaling automatically with paid usage. Enterprise plans include higher limits and dedicated capacity. Contact sales for custom rate terms.

Question 10

Is the Eleven Multilingual v2 API GDPR-compliant?

Accepted Answer

Yes. Source text and generated audio are processed in compliant regions and removed from infrastructure after delivery. Signed DPAs and dedicated VPC deployments available for enterprise customers.

Eleven Multilingual v2 API — Multilingual Speech Generation
Eleven Multilingual v2 TTS in 30+ languages via REST API. Pay per character.

Why teams ship with Eleven Multilingual v2

30+ languages from one model

Pre-built voices for instant use

Use your cloned voices

Low-latency audio streaming

MP3, WAV, PCM, and Opus

Pay per character generated

Same key for image, video, LLM

GDPR-ready, DPA available

Eleven Multilingual v2 use cases

A few lines of code.
Multilingual speech in one POST

Common questions about Eleven Multilingual v2 API — Multilingual Speech Generation

Ready to create?

Eleven Multilingual v2 API — Multilingual Speech GenerationEleven Multilingual v2 TTS in 30+ languages via REST API. Pay per character.