Happy Horse 1.0 is now on ModelsLab

Try Now
Skip to main content
Available now on ModelsLab · Language Model

GLM OCRGLM OCR Extracts Everything

Parse Documents Accurately

Tops OmniDocBench

Scores 94.62 on OmniDocBench V1.5 for text, tables, formulas.

0.9B Parameters

Runs on Edge Devices

Deploys via vLLM, SGLang; low latency for GLM OCR API.

Multimodal Input

Handles PDFs Images

Processes JPG, PNG, PDFs up to 100 pages with layouts.

Examples

See what GLM OCR can create

Copy any prompt below and try it yourself in the playground.

Invoice Extraction

Extract all text, tables, and key fields like date, amount, vendor from this invoice image in structured JSON format.

Table Recognition

Parse the complex table in this document image, output as Markdown preserving rows, columns, and formulas.

Code Documentation

Transcribe the code snippets and surrounding text from this screenshot, maintaining structure and syntax.

Contract Analysis

Identify sections, clauses, and tables in this contract PDF page, output in semantic Markdown.

For Developers

A few lines of code.
OCR via GLM OCR API

ModelsLab handles the infrastructure: fast inference, auto-scaling, and a developer-friendly API. No GPU management needed.

  • Serverless: scales to zero, scales to millions
  • Pay per token, no minimums
  • Python and JavaScript SDKs, plus REST API
import requests
response = requests.post(
"https://modelslab.com/api/v7/llm/chat/completions",
json={
"key": "YOUR_API_KEY",
"prompt": "",
"model_id": ""
}
)
print(response.json())

FAQ

Common questions about GLM OCR

Read the docs

GLM OCR is a 0.9B-parameter multimodal OCR for document understanding. It excels in text, tables, formulas via CogViT encoder. Tops benchmarks like OmniDocBench.

Send PDFs or images to LLM endpoint. Supports up to 100 pages, 50MB PDFs. Outputs structured Markdown or JSON.

Combines vision encoder with Multi-Token Prediction for fast decoding. Handles handwriting, multilingual text, seals. Efficient for production.

Yes, GLM OCR alternative outperforms many in real scenarios like invoices, code docs. Smaller size than VLMs with SOTA accuracy.

JPG, PNG images up to 10MB, PDFs up to 50MB. Processes complex layouts, mixed text-image content.

Supports vLLM, SGLang, Ollama for low-latency inference. Ideal for edge and high-concurrency services.

Ready to create?

Start generating with GLM OCR on ModelsLab.