Available now on ModelsLab · Language Model

LiquidAI: LFM2-24B-A2B
Fast MoE Inference Engine

Try LiquidAI: LFM2-24B-A2B API Documentation

Scale Agents Efficiently

Hybrid MoE

24B Params 2.3B Active

Activates 2.3B params per token in 40-layer A2B architecture with 30 conv blocks.

Low Memory

Fits 32GB RAM

Deploys on laptops, edge devices, and H100s for LiquidAI: LFM2-24B-A2B API workflows.

High Throughput

26K Tokens Second

Handles 1024 concurrent requests at 32K context for liquidai lfm2 24b a2b pipelines.

Examples

See what LiquidAI: LFM2-24B-A2B can create

Copy any prompt below and try it yourself in the playground.

Math Proof

“Prove the Pythagorean theorem step-by-step using geometric arguments and formal logic. Include diagrams in ASCII art and verify with coordinates.”

Code Debugger

“Analyze this Python function for bugs: def factorial(n): if n == 0: return 1 else: return n * factorial(n-1). Fix recursion depth issues and optimize for large n.”

Agent Workflow

“Plan a multi-step research task: query database for sales data, analyze trends with stats, generate report in JSON, and suggest actions.”

RAG Summary

“Summarize key insights from these documents on climate models, extract trends, and output structured JSON with citations for 32K context.”

For Developers

A few lines of code.
Agents. Two Lines.

ModelsLab handles the infrastructure: fast inference, auto-scaling, and a developer-friendly API. No GPU management needed.

Serverless: scales to zero, scales to millions
Pay per token, no minimums
Python and JavaScript SDKs, plus REST API

API Documentation

import requests

response = requests.post(
    "https://modelslab.com/api/v7/llm/chat/completions",
    json={
  "key": "YOUR_API_KEY",
  "prompt": "",
  "model_id": ""
}
)
print(response.json())

FAQ

Common questions about LiquidAI: LFM2-24B-A2B

Read the docs

Hybrid MoE LLM with 24B total params, 2.3B active per token. Uses 30 gated conv + 10 GQA blocks in A2B ratio. Fits 32GB RAM for edge deployment.

Reaches 26.8K tokens/sec on H100 at 1024 requests. Outperforms Qwen3-30B-A3B in throughput. Supports 32K context for agents.

Powers high-volume multi-agent pipelines, RAG, function calling. Excels in instruction following, math reasoning, tool use. Native web search support.

Activates sparse 2.3B params for low memory cost. Runs on consumer laptops with iGPU/NPU. 112 tok/s on AMD CPU.

Beats gpt-oss-20b, Qwen3-30B-A3B in benchmarks. Ideal dense model replacement for agentic tasks. Scales from 350M to 24B.

Available via Together AI, LM Studio, OpenVINO. Supports vLLM, llama.cpp. Deploy locally or cloud for privacy.

Ready to create?

Start generating with LiquidAI: LFM2-24B-A2B on ModelsLab.

Try LiquidAI: LFM2-24B-A2B API Documentation

LiquidAI: LFM2-24B-A2BFast MoE Inference Engine

Scale Agents Efficiently

24B Params 2.3B Active

Fits 32GB RAM

26K Tokens Second

See what LiquidAI: LFM2-24B-A2B can create

A few lines of code.Agents. Two Lines.

Common questions about LiquidAI: LFM2-24B-A2B

What is LiquidAI: LFM2-24B-A2B?

How fast is liquidai lfm2 24b a2b?

What is LiquidAI: LFM2-24B-A2B API used for?

Is LiquidAI: LFM2-24B-A2B model efficient?

LiquidAI: LFM2-24B-A2B alternative to what?

Where to access liquidai lfm2 24b a2b api?

Ready to create?

LiquidAI: LFM2-24B-A2B
Fast MoE Inference Engine

A few lines of code.
Agents. Two Lines.