GLM-OCR
The GLM-OCR guide has runnable examples for all common use cases, including a full pipeline with Qwen3-Embedding-8B, Qwen3.5-122B-A10B-FP8, n8n, and vector database options (pgvector, Qdrant, ChromaDB).
Description
"GLM-OCR" is a document optical character recognition (OCR) model by Z.ai (ZhipuAI), specialized for accurate text extraction from documents and images. An integrated document proxy on our platform automatically converts PDF, DOCX, PPTX, XLSX, HTML, SVG, and raster image formats to PNG pages before the model processes them.
It supports and is suitable for:
- Extracting text from PDF, DOCX, PPTX, XLSX, HTML, and many other document formats
- Processing scanned documents, invoices, contracts, forms, and reports
- Text extraction from tables and structured layouts
- Formula and mathematical expression recognition
- Key information extraction (KIE) — structured JSON output from forms, receipts, certificates, and cards
- Retrieval-Augmented Generation (RAG) pre-processing — high-accuracy document parsing for knowledge bases
- Multilingual documents — Chinese, English, German, French, Spanish, Russian, Japanese, Korean, and others
The following limitations apply:
- Maximum 30 pages per request — the API returns
HTTP 413if the document exceeds this limit. Split larger documents into batches of 30 pages. - Maximum request body: 200 MB
- Maximum context length: 131,072 tokens (~4,000 tokens per page at typical document density)
- No tool-calling or function calling support
- No memory between requests — the model does not remember previous extractions. Each API call is independent; send the document again if you need to ask something new about it.
Supported input formats
All content is delivered as a base64-encoded data URI in the image_url field. The proxy automatically detects the format and converts it to per-page PNG images before passing them to the model.
| Format | MIME type for data URI | Notes |
|---|---|---|
application/pdf | Up to 30 pages per request | |
| JPEG | image/jpeg | Handled natively |
| PNG | image/png | Handled natively |
| TIFF | image/tiff | Multi-frame → one page per frame |
| GIF | image/gif | Animated → one page per frame |
| WebP | image/webp | Animated → one page per frame |
| BMP | image/bmp | |
| SVG | image/svg+xml | Rasterised via cairosvg |
| HTML | text/html | Rendered via WeasyPrint |
| DOCX | application/vnd.openxmlformats-officedocument.wordprocessingml.document | Converted via mammoth + WeasyPrint |
| PPTX | application/vnd.openxmlformats-officedocument.presentationml.presentation | One page per slide |
| XLSX | application/vnd.openxmlformats-officedocument.spreadsheetml.sheet | One page per sheet, max 2,000 rows |
| XLS | application/vnd.ms-excel | Legacy Excel format |
API usage
GLM-OCR is accessed via the standard chat completions endpoint (/v1/chat/completions) with the model name GLM-OCR. The document proxy intercepts the request, converts the document to PNG pages, and forwards them to the model — no page splitting required on your side.
PDF document extraction
- Python
- JavaScript
- PHP
import base64
from openai import OpenAI
client = OpenAI(
base_url="https://llm.aihosting.mittwald.de/v1",
api_key="<your-api-key>",
)
with open("document.pdf", "rb") as f:
pdf_b64 = base64.b64encode(f.read()).decode()
response = client.chat.completions.create(
model="GLM-OCR",
messages=[
{
"role": "user",
"content": [
{
"type": "image_url",
"image_url": {
"url": f"data:application/pdf;base64,{pdf_b64}",
},
},
{
"type": "text",
"text": "Extract all text from this document.",
},
],
}
],
temperature=0.1,
)
print(response.choices[0].message.content)
import fs from "fs";
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "https://llm.aihosting.mittwald.de/v1",
apiKey: "<your-api-key>",
});
const pdfB64 = fs.readFileSync("document.pdf").toString("base64");
const response = await client.chat.completions.create({
model: "GLM-OCR",
messages: [
{
role: "user",
content: [
{
type: "image_url",
image_url: { url: `data:application/pdf;base64,${pdfB64}` },
},
{ type: "text", text: "Extract all text from this document." },
],
},
],
temperature: 0.1,
});
console.log(response.choices[0].message.content);
<?php
// composer require openai-php/client guzzlehttp/guzzle
$client = OpenAI::factory()
->withBaseUri('https://llm.aihosting.mittwald.de/v1')
->withApiKey('<your-api-key>')
->make();
$pdfB64 = base64_encode(file_get_contents('document.pdf'));
$response = $client->chat()->create([
'model' => 'GLM-OCR',
'messages' => [[
'role' => 'user',
'content' => [
[
'type' => 'image_url',
'image_url' => ['url' => "data:application/pdf;base64,{$pdfB64}"],
],
['type' => 'text', 'text' => 'Extract all text from this document.'],
],
]],
'temperature' => 0.1,
]);
echo $response->choices[0]->message->content;
Single image extraction
For individual document images (JPEG, PNG):
- Python
- JavaScript
- PHP
import base64
from openai import OpenAI
client = OpenAI(
base_url="https://llm.aihosting.mittwald.de/v1",
api_key="<your-api-key>",
)
with open("page.jpg", "rb") as f:
img_b64 = base64.b64encode(f.read()).decode()
response = client.chat.completions.create(
model="GLM-OCR",
messages=[
{
"role": "user",
"content": [
{
"type": "image_url",
"image_url": {
"url": f"data:image/jpeg;base64,{img_b64}",
},
},
{
"type": "text",
"text": "Extract all text from this image.",
},
],
}
],
temperature=0.1,
)
print(response.choices[0].message.content)
import fs from "fs";
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "https://llm.aihosting.mittwald.de/v1",
apiKey: "<your-api-key>",
});
const imgB64 = fs.readFileSync("page.jpg").toString("base64");
const response = await client.chat.completions.create({
model: "GLM-OCR",
messages: [
{
role: "user",
content: [
{
type: "image_url",
image_url: { url: `data:image/jpeg;base64,${imgB64}` },
},
{ type: "text", text: "Extract all text from this image." },
],
},
],
temperature: 0.1,
});
console.log(response.choices[0].message.content);
<?php
// composer require openai-php/client guzzlehttp/guzzle
$client = OpenAI::factory()
->withBaseUri('https://llm.aihosting.mittwald.de/v1')
->withApiKey('<your-api-key>')
->make();
$imgB64 = base64_encode(file_get_contents('page.jpg'));
$response = $client->chat()->create([
'model' => 'GLM-OCR',
'messages' => [[
'role' => 'user',
'content' => [
[
'type' => 'image_url',
'image_url' => ['url' => "data:image/jpeg;base64,{$imgB64}"],
],
['type' => 'text', 'text' => 'Extract all text from this image.'],
],
]],
'temperature' => 0.1,
]);
echo $response->choices[0]->message->content;
Recommended inference parameters
GLM-OCR is a deterministic extraction model. Use low temperature for accurate, faithful text extraction:
| Parameter | Value |
|---|---|
temperature | 0.1 |
top_p | 1.0 |
max_tokens | 4096 per page (scale with page count) |
Output modes
The model's output format is controlled entirely through your prompt — there is no separate API parameter for it.
| Mode | How to activate | Behaviour |
|---|---|---|
| Plain text | "Extract all text from this document." | Raw text, no formatting |
| Markdown | "Extract the text and format it as Markdown. Use # for headings and - for lists." | Preserves headings, lists, emphasis — good for RAG |
| JSON (KIE) | "Extract these fields and return them as a JSON object: {…}" | Structured extraction; output always wrapped in ```json ``` fences — strip before parsing |
| HTML table | "Return the table as an HTML <table> element." | Useful for spreadsheet-like data |
Terms of Use and Licensing
The general terms of use apply. The model is provided by Z.ai under the MIT License, and reuse of the extracted content is not subject to any additional restrictions imposed by the model license.