Happy Horse 1.0 is now on ModelsLab

Try Now
Skip to main content
Available now on ModelsLab · Language Model

LiquidAI: LFM2-24B-A2BFast MoE Inference Engine

Scale Agents Efficiently

Hybrid MoE

24B Params 2.3B Active

Activates 2.3B params per token in 40-layer A2B architecture with 30 conv blocks.

Low Memory

Fits 32GB RAM

Deploys on laptops, edge devices, and H100s for LiquidAI: LFM2-24B-A2B API workflows.

High Throughput

26K Tokens Second

Handles 1024 concurrent requests at 32K context for liquidai lfm2 24b a2b pipelines.

Examples

See what LiquidAI: LFM2-24B-A2B can create

Copy any prompt below and try it yourself in the playground.

Math Proof

Prove the Pythagorean theorem step-by-step using geometric arguments and formal logic. Include diagrams in ASCII art and verify with coordinates.

Code Debugger

Analyze this Python function for bugs: def factorial(n): if n == 0: return 1 else: return n * factorial(n-1). Fix recursion depth issues and optimize for large n.

Agent Workflow

Plan a multi-step research task: query database for sales data, analyze trends with stats, generate report in JSON, and suggest actions.

RAG Summary

Summarize key insights from these documents on climate models, extract trends, and output structured JSON with citations for 32K context.

For Developers

A few lines of code.
Agents. Two Lines.

ModelsLab handles the infrastructure: fast inference, auto-scaling, and a developer-friendly API. No GPU management needed.

  • Serverless: scales to zero, scales to millions
  • Pay per token, no minimums
  • Python and JavaScript SDKs, plus REST API
import requests
response = requests.post(
"https://modelslab.com/api/v7/llm/chat/completions",
json={
"key": "YOUR_API_KEY",
"prompt": "",
"model_id": ""
}
)
print(response.json())

FAQ

Common questions about LiquidAI: LFM2-24B-A2B

Read the docs

Hybrid MoE LLM with 24B total params, 2.3B active per token. Uses 30 gated conv + 10 GQA blocks in A2B ratio. Fits 32GB RAM for edge deployment.

Reaches 26.8K tokens/sec on H100 at 1024 requests. Outperforms Qwen3-30B-A3B in throughput. Supports 32K context for agents.

Powers high-volume multi-agent pipelines, RAG, function calling. Excels in instruction following, math reasoning, tool use. Native web search support.

Activates sparse 2.3B params for low memory cost. Runs on consumer laptops with iGPU/NPU. 112 tok/s on AMD CPU.

Beats gpt-oss-20b, Qwen3-30B-A3B in benchmarks. Ideal dense model replacement for agentic tasks. Scales from 350M to 24B.

Available via Together AI, LM Studio, OpenVINO. Supports vLLM, llama.cpp. Deploy locally or cloud for privacy.

Ready to create?

Start generating with LiquidAI: LFM2-24B-A2B on ModelsLab.