Seedance 2.0 is here - create consistent, multimodal AI videos faster with images, videos, and audio in one prompt.

Try Now
Skip to main content
Available now on ModelsLab · Voice & Audio

Scribe v1Transcribe Accurately

Sample output

Unlock Precise Transcription

99 Languages

Global Speech Recognition

Handles transcription in 99 languages with word-level timestamps.

Speaker Diarization

Identify Multiple Speakers

Separates speakers in audio for structured JSON output.

Real-World Audio

Robust Noise Handling

Processes unpredictable audio with event tagging like laughter.

Examples

See what Scribe v1 can create

Copy any prompt below and try it yourself in the playground.

Tech Conference

Transcribe panel discussion audio with multiple speakers, English, include timestamps and laughter markers.

Product Demo

Convert sales pitch video audio to text, detect entities, Spanish, word-level timestamps.

Podcast Episode

Transcribe bilingual interview, French-English, speaker labels, non-speech events.

Voice Note

Process daily voice memo, German, structured JSON with diarization if multi-speaker.

For Developers

A few lines of code.
Transcribe audio. One call.

ModelsLab handles the infrastructure: fast inference, auto-scaling, and a developer-friendly API. No GPU management needed.

  • Serverless: scales to zero, scales to millions
  • Pay per second, no minimums
  • Python and JavaScript SDKs, plus REST API
import requests
response = requests.post(
"https://modelslab.com/api/v7/voice/speech-to-text",
json={
"key": "YOUR_API_KEY",
"model_id": "scribe_v1",
"init_audio": "https://pub-3626123a908346a7a8be8d9295f44e26.r2.dev/generations/26fe4ebe-5e82-42ba-a794-3dccbaa508e4.mp3"
}
)
print(response.json())

FAQ

Common questions about Scribe v1

Read the docs

scribe_v1 is ElevenLabs' speech-to-text model for accurate audio transcription. It supports 99 languages with timestamps and diarization. Outclassed by v2 but reliable for most use cases.

Access via ElevenLabs Speech To Text endpoint. Send audio files for JSON transcripts. Supports PCM formats from 8kHz to 48kHz.

scribe_v1 transcribes audio, not clones voices. Use it to capture source audio for voice cloning workflows. Pairs with ElevenLabs TTS for cloning.

Upgrade to scribe_v2 for better accuracy and 90+ languages. scribe_v1 suits basic transcription needs.

scribe_v1 provides batch transcription. For real-time, use scribe_v2_realtime with ~150ms latency.

Transcribe with scribe_v1, then clone via ElevenLabs PVC or IVC. Ensures clean input for high-fidelity clones.

Ready to create?

Start generating with Scribe v1 on ModelsLab.