Available now on ModelsLab · Language Model

Arcee AI: Trinity Mini
Efficient MoE Reasoning

Try Arcee AI: Trinity Mini API Documentation

Run Agents Seamlessly

Sparse MoE

3B Active Params

26B model activates 3B per token from 128 experts for low-latency inference.

Long Context

131K Token Window

Handles extended inputs with strong utilization for grounded multi-turn responses.

Tool Calling

Reliable Function Use

Delivers schema-true JSON and agent recovery in Arcee AI: Trinity Mini API.

Examples

See what Arcee AI: Trinity Mini can create

Copy any prompt below and try it yourself in the playground.

Code Review

“Review this Python function for bugs and suggest optimizations: def fibonacci(n): if n <= 1: return n else: return fibonacci(n-1) + fibonacci(n-2)”

JSON Schema

“Generate valid JSON for user profile schema with name, email, age over 18, and preferences array.”

Agent Workflow

“Plan multi-step task: fetch weather API for NYC, compare to Tokyo, output summary in table format.”

Document Summary

“Summarize key points from this 10K token RAG document on quantum computing advancements.”

For Developers

A few lines of code.
Reasoning. Few lines.

ModelsLab handles the infrastructure: fast inference, auto-scaling, and a developer-friendly API. No GPU management needed.

Serverless: scales to zero, scales to millions
Pay per token, no minimums
Python and JavaScript SDKs, plus REST API

API Documentation

import requests

response = requests.post(
    "https://modelslab.com/api/v7/llm/chat/completions",
    json={
  "key": "YOUR_API_KEY",
  "prompt": "",
  "model_id": ""
}
)
print(response.json())

FAQ

Common questions about Arcee AI: Trinity Mini

Read the docs

26B sparse MoE LLM with 3B active params. Optimized for efficient reasoning over 131K contexts. Released December 2025 by Arcee AI.

Activates 8 of 128 experts per token. Supports tool calling, structured JSON, multi-turn agents. Runs on vLLM, SGLang, llama.cpp.

131,072 tokens input and output. Excels in long-context RAG and agent tasks. Full utilization reduces hallucination.

Yes, Apache 2.0 license. Weights on Hugging Face as W4A16 quantized. Trained on 10T curated tokens.

Efficient alternative to dense models for agents. Matches reasoning at lower compute via MoE. Deploy cloud or on-prem.

Agent backends, RAG chatbots, tool orchestration. Low TTFT at 139ms, 168 tok/s throughput via Clarifai.

Ready to create?

Start generating with Arcee AI: Trinity Mini on ModelsLab.

Try Arcee AI: Trinity Mini API Documentation

Arcee AI: Trinity MiniEfficient MoE Reasoning

Run Agents Seamlessly

3B Active Params

131K Token Window

Reliable Function Use

See what Arcee AI: Trinity Mini can create

A few lines of code.Reasoning. Few lines.

Common questions about Arcee AI: Trinity Mini

What is Arcee AI: Trinity Mini?

How does arcee ai trinity mini work?

What is Arcee AI: Trinity Mini API context length?

Is Arcee AI: Trinity Mini model open-weight?

Arcee AI: Trinity Mini alternative to what?

Where use arcee ai trinity mini api?

Ready to create?

Arcee AI: Trinity Mini
Efficient MoE Reasoning

A few lines of code.
Reasoning. Few lines.