Happy Horse 1.0 is now on ModelsLab

Try Now
Skip to main content
Available now on ModelsLab · Language Model

Arcee AI: Trinity MiniEfficient MoE Reasoning

Run Agents Seamlessly

Sparse MoE

3B Active Params

26B model activates 3B per token from 128 experts for low-latency inference.

Long Context

131K Token Window

Handles extended inputs with strong utilization for grounded multi-turn responses.

Tool Calling

Reliable Function Use

Delivers schema-true JSON and agent recovery in Arcee AI: Trinity Mini API.

Examples

See what Arcee AI: Trinity Mini can create

Copy any prompt below and try it yourself in the playground.

Code Review

Review this Python function for bugs and suggest optimizations: def fibonacci(n): if n <= 1: return n else: return fibonacci(n-1) + fibonacci(n-2)

JSON Schema

Generate valid JSON for user profile schema with name, email, age over 18, and preferences array.

Agent Workflow

Plan multi-step task: fetch weather API for NYC, compare to Tokyo, output summary in table format.

Document Summary

Summarize key points from this 10K token RAG document on quantum computing advancements.

For Developers

A few lines of code.
Reasoning. Few lines.

ModelsLab handles the infrastructure: fast inference, auto-scaling, and a developer-friendly API. No GPU management needed.

  • Serverless: scales to zero, scales to millions
  • Pay per token, no minimums
  • Python and JavaScript SDKs, plus REST API
import requests
response = requests.post(
"https://modelslab.com/api/v7/llm/chat/completions",
json={
"key": "YOUR_API_KEY",
"prompt": "",
"model_id": ""
}
)
print(response.json())

FAQ

Common questions about Arcee AI: Trinity Mini

Read the docs

26B sparse MoE LLM with 3B active params. Optimized for efficient reasoning over 131K contexts. Released December 2025 by Arcee AI.

Activates 8 of 128 experts per token. Supports tool calling, structured JSON, multi-turn agents. Runs on vLLM, SGLang, llama.cpp.

131,072 tokens input and output. Excels in long-context RAG and agent tasks. Full utilization reduces hallucination.

Yes, Apache 2.0 license. Weights on Hugging Face as W4A16 quantized. Trained on 10T curated tokens.

Efficient alternative to dense models for agents. Matches reasoning at lower compute via MoE. Deploy cloud or on-prem.

Agent backends, RAG chatbots, tool orchestration. Low TTFT at 139ms, 168 tok/s throughput via Clarifai.

Ready to create?

Start generating with Arcee AI: Trinity Mini on ModelsLab.