Available now on ModelsLab · Voice & Audio

Scribe v1
Transcribe Accurately

Try Scribe v1 API Documentation

Sample output

Unlock Precise Transcription

99 Languages

Global Speech Recognition

Handles transcription in 99 languages with word-level timestamps.

Speaker Diarization

Identify Multiple Speakers

Separates speakers in audio for structured JSON output.

Real-World Audio

Robust Noise Handling

Processes unpredictable audio with event tagging like laughter.

Examples

See what Scribe v1 can create

Copy any prompt below and try it yourself in the playground.

Tech Conference

“Transcribe panel discussion audio with multiple speakers, English, include timestamps and laughter markers.”

Product Demo

“Convert sales pitch video audio to text, detect entities, Spanish, word-level timestamps.”

Podcast Episode

“Transcribe bilingual interview, French-English, speaker labels, non-speech events.”

Voice Note

“Process daily voice memo, German, structured JSON with diarization if multi-speaker.”

For Developers

A few lines of code.
Transcribe audio. One call.

ModelsLab handles the infrastructure: fast inference, auto-scaling, and a developer-friendly API. No GPU management needed.

Serverless: scales to zero, scales to millions
Pay per second, no minimums
Python and JavaScript SDKs, plus REST API

API Documentation

import requests

response = requests.post(
    "https://modelslab.com/api/v7/voice/speech-to-text",
    json={
  "key": "YOUR_API_KEY",
  "model_id": "scribe_v1",
  "init_audio": "https://pub-3626123a908346a7a8be8d9295f44e26.r2.dev/generations/26fe4ebe-5e82-42ba-a794-3dccbaa508e4.mp3"
}
)
print(response.json())

FAQ

Common questions about Scribe v1

Read the docs

scribe_v1 is ElevenLabs' speech-to-text model for accurate audio transcription. It supports 99 languages with timestamps and diarization. Outclassed by v2 but reliable for most use cases.

Access via ElevenLabs Speech To Text endpoint. Send audio files for JSON transcripts. Supports PCM formats from 8kHz to 48kHz.

scribe_v1 transcribes audio, not clones voices. Use it to capture source audio for voice cloning workflows. Pairs with ElevenLabs TTS for cloning.

Upgrade to scribe_v2 for better accuracy and 90+ languages. scribe_v1 suits basic transcription needs.

scribe_v1 provides batch transcription. For real-time, use scribe_v2_realtime with ~150ms latency.

Transcribe with scribe_v1, then clone via ElevenLabs PVC or IVC. Ensures clean input for high-fidelity clones.

Ready to create?

Start generating with Scribe v1 on ModelsLab.

Try Scribe v1 API Documentation

Scribe v1Transcribe Accurately

Unlock Precise Transcription

Global Speech Recognition

Identify Multiple Speakers

Robust Noise Handling

See what Scribe v1 can create

A few lines of code.Transcribe audio. One call.

Common questions about Scribe v1

What is scribe_v1 model?

How to use scribe_v1 API?

Is scribe_v1 good for scribe_v1 voice cloning?

What is scribe_v1 alternative?

Does scribe_v1 support real-time?

scribe_v1 voice cloning workflow?

Ready to create?

Scribe v1
Transcribe Accurately

A few lines of code.
Transcribe audio. One call.