Available now on ModelsLab · AI Model

GLM OCR
GLM OCR Extracts Everything

Try GLM OCR API Documentation

Parse Documents Accurately

Tops OmniDocBench

Scores 94.62 on OmniDocBench V1.5 for text, tables, formulas.

0.9B Parameters

Runs on Edge Devices

Deploys via vLLM, SGLang; low latency for GLM OCR API.

Multimodal Input

Handles PDFs Images

Processes JPG, PNG, PDFs up to 100 pages with layouts.

Examples

See what GLM OCR can create

Copy any prompt below and try it yourself in the playground.

Invoice Extraction

“Extract all text, tables, and key fields like date, amount, vendor from this invoice image in structured JSON format.”

Table Recognition

“Parse the complex table in this document image, output as Markdown preserving rows, columns, and formulas.”

Code Documentation

“Transcribe the code snippets and surrounding text from this screenshot, maintaining structure and syntax.”

Contract Analysis

“Identify sections, clauses, and tables in this contract PDF page, output in semantic Markdown.”

For Developers

A few lines of code.
OCR via GLM OCR API

ModelsLab handles the infrastructure: fast inference, auto-scaling, and a developer-friendly API. No GPU management needed.

Serverless: scales to zero, scales to millions
Pay per token, no minimums
Python and JavaScript SDKs, plus REST API

API Documentation

import requests

response = requests.post(
    "https://modelslab.com/api/v7/llm/chat/completions",
    json={
  "key": "YOUR_API_KEY",
  "prompt": "",
  "model_id": ""
}
)
print(response.json())

FAQ

Common questions about GLM OCR

Read the docs

GLM OCR is a 0.9B-parameter multimodal OCR for document understanding. It excels in text, tables, formulas via CogViT encoder. Tops benchmarks like OmniDocBench.

Send PDFs or images to LLM endpoint. Supports up to 100 pages, 50MB PDFs. Outputs structured Markdown or JSON.

Combines vision encoder with Multi-Token Prediction for fast decoding. Handles handwriting, multilingual text, seals. Efficient for production.

Yes, GLM OCR alternative outperforms many in real scenarios like invoices, code docs. Smaller size than VLMs with SOTA accuracy.

JPG, PNG images up to 10MB, PDFs up to 50MB. Processes complex layouts, mixed text-image content.

Supports vLLM, SGLang, Ollama for low-latency inference. Ideal for edge and high-concurrency services.

Ready to create?

Start generating with GLM OCR on ModelsLab.

Try GLM OCR API Documentation

GLM OCRGLM OCR Extracts Everything

Parse Documents Accurately

Tops OmniDocBench

Runs on Edge Devices

Handles PDFs Images

See what GLM OCR can create

A few lines of code.OCR via GLM OCR API

Common questions about GLM OCR

What is GLM OCR model?

How to use GLM OCR API?

What makes GLM OCR LLM unique?

Is GLM OCR alternative to others?

What inputs does glm ocr api accept?

Can GLM OCR model deploy locally?

Ready to create?

GLM OCR
GLM OCR Extracts Everything

A few lines of code.
OCR via GLM OCR API