HuggingFace just dropped a trending repository that's rewriting how developers add AI capabilities to their coding agents. huggingface/skills hit 7,374 stars with 5,938 added in a single week — making it one of the fastest-moving repos on GitHub right now.
What it introduces is called the Agent Skills format: a standardized way to package instructions, scripts, and resources into a folder that any AI coding agent can discover and use. The key file is SKILL.md — a plain Markdown file with YAML frontmatter that tells the agent what the skill does and how to use it.
It's compatible with Claude Code, OpenAI Codex, Google Gemini CLI, and Cursor. One format, four major tools.
This post walks through what Agent Skills are, why they matter for developers building AI-powered workflows, and how to create a ModelsLab skill that gives your coding agent on-demand image generation, video creation, and speech synthesis.
What Is the Agent Skills Format?
An agent skill is a self-contained folder. At minimum it needs a SKILL.md file. That file starts with YAML frontmatter (name and description), followed by the actual guidance the agent follows when the skill is active.
# skills/modelslab-image-gen/SKILL.md
---
name: modelslab-image-gen
description: Generate images, videos, and audio using ModelsLab API. Use when the user asks to create visuals, render scenes, or produce media assets.
---
## ModelsLab Image Generation
Use the ModelsLab REST API to generate images from text prompts.
API key is stored in MODELSLAB_API_KEY environment variable.
Base URL: https://modelslab.com/api/v6
Endpoint: POST /images/text2img
Always check the response status. If status is "processing", poll the fetch URL until complete.
Skills live in standard locations that agents discover automatically:
$REPO_ROOT/.agents/skills/— project-level skills$HOME/.agents/skills/— user-level skills available in all projects
Once placed there, Codex, Claude Code, and Gemini CLI pick them up without configuration. The agent reads the skill when it's relevant to the task, or when you invoke it explicitly.
Why This Matters for AI Workflows
Before the Agent Skills standard, adding external API capabilities to a coding agent meant either writing custom tool definitions (which vary by framework) or dropping API documentation into your system prompt (which bloats context and doesn't persist).
Skills solve both problems:
- Portable. One SKILL.md works across Claude Code, Codex, Gemini CLI, and Cursor without changes.
- Composable. Install multiple skills, and agents use whichever is relevant to the current task.
- Versioned. Skills live in your repo, so they're committed, diffed, and reviewed like code.
- Shareable. Anyone can publish a skill to a GitHub repo.
/plugin install skill-name@repoinstalls it instantly.
For developers building agents that need to generate media — images for UI mockups, audio for prototypes, video clips for demos — this means adding that capability to any coding agent in under five minutes.
Building a ModelsLab Skill Pack
Here's a practical skill pack that gives your coding agent access to ModelsLab's image generation, text-to-speech, and video generation APIs. Each skill lives in its own folder under .agents/skills/.
1. Image Generation Skill
# .agents/skills/modelslab-images/SKILL.md
---
name: modelslab-images
description: Generate images from text prompts using ModelsLab API. Use when creating product mockups, UI assets, illustrations, or any visual content from a description.
---
## ModelsLab Image Generation
### Endpoint
POST https://modelslab.com/api/v6/images/text2img
### Required headers
Content-Type: application/json
### Request body
```json
{
"key": "$MODELSLAB_API_KEY",
"prompt": "your detailed prompt here",
"negative_prompt": "blurry, low quality",
"width": 512,
"height": 512,
"samples": 1,
"num_inference_steps": 20,
"guidance_scale": 7.5,
"enhance_prompt": "yes"
}
```
### Response handling
- status "success": images array contains URLs, download immediately
- status "processing": call the fetch_result endpoint with the request_id
- Retry fetch every 5 seconds, max 10 attempts
### Example curl
```bash
curl -X POST "https://modelslab.com/api/v6/images/text2img" \
-H "Content-Type: application/json" \
-d "{\"key\":\"$MODELSLAB_API_KEY\",\"prompt\":\"modern dashboard UI, dark theme, data visualization\",\"width\":1024,\"height\":768,\"samples\":1}"
```
2. Text-to-Speech Skill
# .agents/skills/modelslab-tts/SKILL.md
---
name: modelslab-tts
description: Convert text to speech audio using ModelsLab API. Use when generating voiceovers, narration, accessibility audio, or any spoken content.
---
## ModelsLab Text-to-Speech
### Endpoint
POST https://modelslab.com/api/v6/voice/text_to_audio
### Request body
```json
{
"key": "$MODELSLAB_API_KEY",
"prompt": "text to convert to speech",
"language": "English",
"voice_id": 11,
"init_audio": null
}
```
### Voice IDs (common)
- 0: Default neutral
- 11: Rachel (professional female)
- 21: Adam (warm male)
### Response
Returns audio_url pointing to an MP3 file. Download and save with descriptive filename.
3. Video Generation Skill
# .agents/skills/modelslab-video/SKILL.md
---
name: modelslab-video
description: Generate short video clips from text prompts using ModelsLab API. Use when creating motion graphics, product demos, animated scenes, or b-roll footage.
---
## ModelsLab Video Generation
### Endpoint
POST https://modelslab.com/api/v6/video/text2video
### Request body
```json
{
"key": "$MODELSLAB_API_KEY",
"prompt": "detailed scene description",
"negative_prompt": "low quality, watermark",
"height": 512,
"width": 512,
"num_frames": 16,
"num_inference_steps": 20,
"guidance_scale": 7.5
}
```
### Notes
Video generation typically takes 30-120 seconds. Always use the fetch endpoint to poll for completion. Save the output file before the URL expires (24h TTL).
Installing Your Skill Pack
Create the skills in your home directory to make them available across all projects:
mkdir -p ~/.agents/skills/modelslab-images
mkdir -p ~/.agents/skills/modelslab-tts
mkdir -p ~/.agents/skills/modelslab-video
# Create each SKILL.md file (or clone a prepared repo)
# Then set your API key
export MODELSLAB_API_KEY="your_key_here"
Get your API key at modelslab.com/dashboard — free tier available.
Once installed, test it in Claude Code:
# In Claude Code or any compatible agent:
# "Generate a product screenshot mockup for a dark-mode analytics dashboard"
# The agent discovers the modelslab-images skill and calls the API automatically
Using HuggingFace's Skill Marketplace
The huggingface/skills repo is also a community marketplace. You can register it as a plugin source and install community-maintained skills for common AI/ML tasks:
# In a compatible agent:
/plugin marketplace add huggingface/skills
# Install specific HuggingFace skills
/plugin install hugging-face-cli@huggingface/skills
/plugin install dataset-creator@huggingface/skills
The skills cover tasks like model training, dataset creation, and evaluation — all the infrastructure work that coding agents need to do when working on ML projects. Combined with a ModelsLab skill for inference and generation, you get a full ML development toolkit that any agent can use.
The Bigger Pattern: Composable Agent Capabilities
The Agent Skills standard is part of a broader shift in how developers build AI-powered tools. Rather than monolithic agents with hard-coded capabilities, the pattern is moving toward composable capability bundles:
- Skills for tools: ModelsLab (generation), HuggingFace (training), Browserbase (web automation)
- Skills for workflows: Deploy, test, debug, document
- Skills for domains: Finance, healthcare, legal — compliance rules packaged as skills
Any developer can write a skill, publish it to GitHub, and make it installable across Claude Code, Codex, Gemini CLI, and Cursor simultaneously. The same way npm packages work for code libraries, the Agent Skills format is emerging as the package manager for agent capabilities.
API Coverage and Pricing
ModelsLab's API covers 200+ models across seven modalities from a single API key:
- Image generation: Stable Diffusion XL, Flux, custom fine-tuned models
- Video generation: AnimateDiff, Stable Video Diffusion, Kling, LTX-Video
- Text-to-speech: 900+ voices across 29 languages
- Speech-to-text: Whisper and fine-tuned transcription models
- Image editing: Inpainting, upscaling, background removal, face enhancement
- LLMs: Llama, Mistral, and open-source language models
- 3D generation: Point clouds and mesh generation from text or images
One API key, one SDK integration, access to everything. That's the right abstraction for a coding agent skill — you define the skill once, and it works regardless of which underlying model is most appropriate for the task.
Pricing starts with a free tier. See modelslab.com/pricing for credit plans.
Getting Started
The quickest path to a working ModelsLab skill for your coding agent:
- Get an API key → modelslab.com/dashboard
- Create the skill files → use the templates above in
~/.agents/skills/ - Set the environment variable →
export MODELSLAB_API_KEY="your_key" - Start prompting → ask your agent to generate an image, and it will use the skill automatically
The HuggingFace skills repo is at github.com/huggingface/skills. Browse the community skills, contribute your own, or use it as a reference for structuring the SKILL.md format.
The Agent Skills standard is six months old and already has support from four major agent tools. It's worth betting on.
