Scribe v1
Transcribe Accurately
Sample output
Unlock Precise Transcription
99 Languages
Global Speech Recognition
Handles transcription in 99 languages with word-level timestamps.
Speaker Diarization
Identify Multiple Speakers
Separates speakers in audio for structured JSON output.
Real-World Audio
Robust Noise Handling
Processes unpredictable audio with event tagging like laughter.
Examples
See what Scribe v1 can create
Copy any prompt below and try it yourself in the playground.
Tech Conference
“Transcribe panel discussion audio with multiple speakers, English, include timestamps and laughter markers.”
Product Demo
“Convert sales pitch video audio to text, detect entities, Spanish, word-level timestamps.”
Podcast Episode
“Transcribe bilingual interview, French-English, speaker labels, non-speech events.”
Voice Note
“Process daily voice memo, German, structured JSON with diarization if multi-speaker.”
For Developers
A few lines of code.
Transcribe audio. One call.
ModelsLab handles the infrastructure: fast inference, auto-scaling, and a developer-friendly API. No GPU management needed.
- Serverless: scales to zero, scales to millions
- Pay per second, no minimums
- Python and JavaScript SDKs, plus REST API
import requestsresponse = requests.post("https://modelslab.com/api/v7/voice/speech-to-text",json={"key": "YOUR_API_KEY","model_id": "scribe_v1","init_audio": "https://pub-3626123a908346a7a8be8d9295f44e26.r2.dev/generations/26fe4ebe-5e82-42ba-a794-3dccbaa508e4.mp3"})print(response.json())