Deploy Dedicated GPU server to run AI models

Deploy Model
Skip to main content

Lyria 3 API: Generate AI Music from Text and Images

Lyria 3 API: Generate AI Music from Text and Images

Integrate AI APIs Today

Build next-generation applications with ModelsLab's enterprise-grade AI APIs for image, video, audio, and chat generation

Get Started
Get Started

Introduction

Until recently, adding original music to an app meant one of three things: licensing a royalty-free library, hiring a composer, or wiring up a scraped-together Suno workflow with no official API. None of that scales when you’re building a game that needs dynamic soundtracks, a video tool that needs on-demand scoring, or a product experience that needs a custom jingle per user.

The Lyria 3 API is Google DeepMind’s answer. It turns a single text prompt or even a reference image into a 30-second, studio-quality track with vocals, structure, and musical coherence from the first note to the last. And it’s now live on ModelsLab at $0.05 per track.

This post covers what Lyria 3 is, how it compares to Suno and ElevenLabs Music, the exact API call to generate your first track, and the use cases where it actually earns its place in a production stack.

What is Lyria 3?

Lyria 3 is Google DeepMind’s latest music generation model, designed to produce high-fidelity 30-second music clips from natural language prompts and optional reference images. Unlike earlier text-to-music models that produced loop-friendly ambient beds, Lyria 3 generates full compositions — vocals, verses, choruses, and structural transitions — that hold together as actual songs.

•   Input: Text prompt (genre, mood, instrumentation, lyrics) + optional reference image

•   Output: 30-second MP3 audio clip with vocals and instruments

•    Access: REST API on ModelsLab, lyria-3 endpoint

Specifications

Field

Value

Model ID

lyria-3

Provider

Google DeepMind

API Documentation

modelslab.com/models/google/lyria-3/api

Input

Text prompt + optional reference image

Output

30-second MP3 audio clip

Clip length

30 seconds (fixed)

Pricing

$0.05 per generation

License

Closed source

Status

Live on ModelsLab

Key Features

Text-to-music with full structural awareness — Describe a genre, a mood, and the instrumentation you want, and Lyria 3 returns a track with an actual intro, verse, and hook — not a 30-second loop. This is the core of what makes the text to music API useful for real product work.

Image-guided composition — Drop in a reference image and Lyria 3 reads the mood, palette, and subject to shape the composition. A desert sunset produces an atmospheric ambient piece; a neon-lit street scene gets you synthwave. This is the feature that separates the image to music API category from everything else on the market.

Vocals, lyrics, and coherent song structure — Lyria 3 generates full vocal performances with lyrics that follow prompt direction, and holds musical consistency across the entire 30-second clip. No robotic cadence, no pitch drift mid-track.

Fast generation at $0.05 per track — Each call returns a finished clip for a nickel, which is one of the sharpest price points in the AI music generation API category right now. Generating 200 tracks costs $10. Generating 10,000 tracks for a content pipeline costs $500.

Watermarked output — Every track Lyria 3 produces carries a SynthID watermark, which matters for any product that has to answer “is this AI-generated?” in its terms of service or platform policies.

Multimodal prompting — You can combine text and image in the same call to get oddly specific results: “a lo-fi hip-hop track inspired by the mood of this photo” works. This opens up creative workflows that text-only AI song generator API models can’t touch.

Best Use Cases

Use Case

What It Enables

Video editors & content tools

On-demand background scoring for Reels, Shorts, and TikTok exports — no more royalty-free library searches

Game developers

Dynamic soundtracks that match scene mood, generated at runtime for $0.05 a track

Ad tech & marketing platforms

Per-campaign jingles and branded audio identities generated from brief text briefs

AI-powered creative apps

Consumer-facing “make me a song” features with predictable unit economics

Podcast producers

Custom intros, outros, and bed music generated from episode titles or descriptions

Meditation & wellness apps

Mood-matched ambient tracks generated from session metadata or user input

How to Use It

Here’s a minimal API call to generate your first track with the Lyria 3 API on ModelsLab:

curl -X POST https://modelslab.com/api/v7/audio/text-to-music \
  -H "Content-Type: application/json" \
  -d '{
    "key": "YOUR_MODELSLAB_API_KEY",
    "model_id": "lyria-3",
    "prompt": "An upbeat Afrobeat song with energetic percussion, groovy bassline, and rhythmic guitar riffs. The vocals are clear and uplifting, with lyrics about dancing together, freedom, and feeling alive under the city lights",
    "init_image": null,
    "base64": false
  }'

To condition the track on a reference image, pass an image URL in the init_image field. The model will read the mood and palette from the image and blend it with your text prompt.

The response returns a signed URL to the generated MP3. Typical generation time is well under a minute for a 30-second clip.

Drop in your ModelsLab API key and you’re shipping music features in minutes.

FAQ

What is Lyria 3? Lyria 3 is Google DeepMind’s music generation model, available as a REST API on ModelsLab. It generates 30-second music tracks with vocals from text prompts or reference images. It’s the clip variant of Google’s broader Lyria 3 family, which also includes Lyria 3 Pro for full-length song generation.

How do I access the Lyria 3 API? The fastest way is through ModelsLab at modelslab.com/models/google/lyria-3. Sign up, grab your API key, and call the lyria-3 model endpoint with a text prompt. No Gemini API waitlist, no Google Cloud billing setup, no regional restrictions.

How much does Lyria 3 cost? On ModelsLab, Lyria 3 is priced at $0.05 per generation. That’s a flat rate per 30-second clip, which makes it easy to forecast unit economics for a music feature — 100 tracks cost $5, 10,000 tracks cost $500.

Can Lyria 3 generate music from images? Yes. Lyria 3 supports multimodal input — pass a reference image alongside your text prompt and the model will compose music inspired by the visual mood, color palette, and subject. This is one of the few image to music API options available today.

Lyria 3 vs Suno — which should I use? Suno V5 still leads on full-song generation with longer durations and more polished pop vocals, but it doesn’t offer an official public API, which makes it risky for production. Lyria 3 is a better fit when you need a documented API, predictable pricing, multimodal input (text + image), and shorter 30-second clips for background scoring, dynamic soundtracks, or on-demand content generation. For teams that need API certainty, Lyria 3 wins.

Ready to try it?

Try Lyria 3 in the Playground — generate your first track in the browser, no code required.

Read the API Docs — full reference, parameters, and code samples.

Share:
Plugins

Explore Plugins for Pro

Our plugins are designed to work with the most popular content creation software.

API

Build Apps with
ML
API

Use our API to build apps, generate AI art, create videos, and produce audio with ease.