Introduction
Until recently, adding original music to an app meant one of three things: licensing a royalty-free library, hiring a composer, or wiring up a scraped-together Suno workflow with no official API. None of that scales when you’re building a game that needs dynamic soundtracks, a video tool that needs on-demand scoring, or a product experience that needs a custom jingle per user.
The Lyria 3 API is Google DeepMind’s answer. It turns a single text prompt or even a reference image into a 30-second, studio-quality track with vocals, structure, and musical coherence from the first note to the last. And it’s now live on ModelsLab at $0.05 per track.
This post covers what Lyria 3 is, how it compares to Suno and ElevenLabs Music, the exact API call to generate your first track, and the use cases where it actually earns its place in a production stack.
What is Lyria 3?
Lyria 3 is Google DeepMind’s latest music generation model, designed to produce high-fidelity 30-second music clips from natural language prompts and optional reference images. Unlike earlier text-to-music models that produced loop-friendly ambient beds, Lyria 3 generates full compositions — vocals, verses, choruses, and structural transitions — that hold together as actual songs.
• Input: Text prompt (genre, mood, instrumentation, lyrics) + optional reference image
• Output: 30-second MP3 audio clip with vocals and instruments
• Access: REST API on ModelsLab, lyria-3 endpoint
Specifications
Field | Value |
Model ID | lyria-3 |
Provider | Google DeepMind |
API Documentation | |
Input | Text prompt + optional reference image |
Output | 30-second MP3 audio clip |
Clip length | 30 seconds (fixed) |
Pricing | $0.05 per generation |
License | Closed source |
Status | Live on ModelsLab |
Key Features
Text-to-music with full structural awareness — Describe a genre, a mood, and the instrumentation you want, and Lyria 3 returns a track with an actual intro, verse, and hook — not a 30-second loop. This is the core of what makes the text to music API useful for real product work.
Image-guided composition — Drop in a reference image and Lyria 3 reads the mood, palette, and subject to shape the composition. A desert sunset produces an atmospheric ambient piece; a neon-lit street scene gets you synthwave. This is the feature that separates the image to music API category from everything else on the market.
Vocals, lyrics, and coherent song structure — Lyria 3 generates full vocal performances with lyrics that follow prompt direction, and holds musical consistency across the entire 30-second clip. No robotic cadence, no pitch drift mid-track.
Fast generation at $0.05 per track — Each call returns a finished clip for a nickel, which is one of the sharpest price points in the AI music generation API category right now. Generating 200 tracks costs $10. Generating 10,000 tracks for a content pipeline costs $500.
Watermarked output — Every track Lyria 3 produces carries a SynthID watermark, which matters for any product that has to answer “is this AI-generated?” in its terms of service or platform policies.
Multimodal prompting — You can combine text and image in the same call to get oddly specific results: “a lo-fi hip-hop track inspired by the mood of this photo” works. This opens up creative workflows that text-only AI song generator API models can’t touch.
Best Use Cases
Use Case | What It Enables |
Video editors & content tools | On-demand background scoring for Reels, Shorts, and TikTok exports — no more royalty-free library searches |
Game developers | Dynamic soundtracks that match scene mood, generated at runtime for $0.05 a track |
Ad tech & marketing platforms | Per-campaign jingles and branded audio identities generated from brief text briefs |
AI-powered creative apps | Consumer-facing “make me a song” features with predictable unit economics |
Podcast producers | Custom intros, outros, and bed music generated from episode titles or descriptions |
Meditation & wellness apps | Mood-matched ambient tracks generated from session metadata or user input |
How to Use It
Here’s a minimal API call to generate your first track with the Lyria 3 API on ModelsLab:
curl -X POST https://modelslab.com/api/v7/audio/text-to-music \
-H "Content-Type: application/json" \
-d '{
"key": "YOUR_MODELSLAB_API_KEY",
"model_id": "lyria-3",
"prompt": "An upbeat Afrobeat song with energetic percussion, groovy bassline, and rhythmic guitar riffs. The vocals are clear and uplifting, with lyrics about dancing together, freedom, and feeling alive under the city lights",
"init_image": null,
"base64": false
}'To condition the track on a reference image, pass an image URL in the init_image field. The model will read the mood and palette from the image and blend it with your text prompt.
The response returns a signed URL to the generated MP3. Typical generation time is well under a minute for a 30-second clip.
Drop in your ModelsLab API key and you’re shipping music features in minutes.
FAQ
What is Lyria 3? Lyria 3 is Google DeepMind’s music generation model, available as a REST API on ModelsLab. It generates 30-second music tracks with vocals from text prompts or reference images. It’s the clip variant of Google’s broader Lyria 3 family, which also includes Lyria 3 Pro for full-length song generation.
How do I access the Lyria 3 API? The fastest way is through ModelsLab at modelslab.com/models/google/lyria-3. Sign up, grab your API key, and call the lyria-3 model endpoint with a text prompt. No Gemini API waitlist, no Google Cloud billing setup, no regional restrictions.
How much does Lyria 3 cost? On ModelsLab, Lyria 3 is priced at $0.05 per generation. That’s a flat rate per 30-second clip, which makes it easy to forecast unit economics for a music feature — 100 tracks cost $5, 10,000 tracks cost $500.
Can Lyria 3 generate music from images? Yes. Lyria 3 supports multimodal input — pass a reference image alongside your text prompt and the model will compose music inspired by the visual mood, color palette, and subject. This is one of the few image to music API options available today.
Lyria 3 vs Suno — which should I use? Suno V5 still leads on full-song generation with longer durations and more polished pop vocals, but it doesn’t offer an official public API, which makes it risky for production. Lyria 3 is a better fit when you need a documented API, predictable pricing, multimodal input (text + image), and shorter 30-second clips for background scoring, dynamic soundtracks, or on-demand content generation. For teams that need API certainty, Lyria 3 wins.
Ready to try it?
Try Lyria 3 in the Playground — generate your first track in the browser, no code required.
Read the API Docs — full reference, parameters, and code samples.

