Seedance 2.0 is here - create consistent, multimodal AI videos faster with images, videos, and audio in one prompt.

Try Now
Skip to main content
Available now on ModelsLab · Language Model

GLM OCRGLM OCR Extracts Everything

Parse Documents Accurately

Tops OmniDocBench

Scores 94.62 on OmniDocBench V1.5 for text, tables, formulas.

0.9B Parameters

Runs on Edge Devices

Deploys via vLLM, SGLang; low latency for GLM OCR API.

Multimodal Input

Handles PDFs Images

Processes JPG, PNG, PDFs up to 100 pages with layouts.

Examples

See what GLM OCR can create

Copy any prompt below and try it yourself in the playground.

Invoice Extraction

Extract all text, tables, and key fields like date, amount, vendor from this invoice image in structured JSON format.

Table Recognition

Parse the complex table in this document image, output as Markdown preserving rows, columns, and formulas.

Code Documentation

Transcribe the code snippets and surrounding text from this screenshot, maintaining structure and syntax.

Contract Analysis

Identify sections, clauses, and tables in this contract PDF page, output in semantic Markdown.

For Developers

A few lines of code.
OCR via GLM OCR API

ModelsLab handles the infrastructure: fast inference, auto-scaling, and a developer-friendly API. No GPU management needed.

  • Serverless: scales to zero, scales to millions
  • Pay per token, no minimums
  • Python and JavaScript SDKs, plus REST API
import requests
response = requests.post(
"https://modelslab.com/api/v7/llm/chat/completions",
json={
"key": "YOUR_API_KEY",
"prompt": "",
"model_id": ""
}
)
print(response.json())

FAQ

Common questions about GLM OCR

Read the docs

Ready to create?

Start generating with GLM OCR on ModelsLab.