Available models

We currently offer the following models, which may change or expand over time. Each is described along with model-specific parameters.

Model Name	Type	Modalities	Context (Tokens)	License
gpt-oss-120b	Chat + reasoning	Text, tool-calling	131,072	Apache 2.0
Qwen3.5-0.8B	Chat + reasoning	Text, tool-calling	262,144	Apache 2.0
Ministral-3-14B-Instruct-2512	Chat + vision	Text, image, tool-calling	262,144	Apache 2.0
Mistral-Medium-3.5-128B	Chat + vision	Text, image, tool-calling	256,000	Apache 2.0
Qwen3.5-122B-A10B-FP8	Chat + reasoning + vision	Text, image, tool-calling	245,760	Apache 2.0
Qwen3.6-35B-A3B-FP8	Chat + reasoning + vision	Text, image, tool-calling	262,144	Apache 2.0
GLM-OCR	Document OCR	PDF, DOCX, PPTX, XLSX, HTML, SVG, image to text	131,072	MIT
Qwen3-Embedding-8B	Embedding	Text to vector	32,768	Apache 2.0
Qwen3-VL-Reranker-2B	Reranking	Text, image to score	32,768	Apache 2.0
whisper-large-v3-turbo	Speech-to-Text	Audio to text	N/A (audio-based)	MIT

Picking models

For complex text-centric workloads and advanced automations when precision and vast knowledge are required
use gpt-oss-120b.
For high-throughput, cost-sensitive tasks that don't require vision (e.g. simple Q&A, routing, classification, and batch processing)
use Qwen3.5-0.8B.
For complex reasoning, multilingual tasks, or vision workloads where a large frontier model is required
use Mistral-Medium-3.5-128B.
For broad, scalable, cost-conscious chat and basic multimodal (text + image) workflows
use Ministral-3-14B-Instruct-2512.
For large-scale reasoning and vision tasks where high model capacity is required
use Qwen3.5-122B-A10B-FP8.
For workloads that require long context windows with reasoning and vision support at lower cost
use Qwen3.6-35B-A3B-FP8.
Special-purpose applications
- For extracting text from PDF, DOCX, PPTX, XLSX, HTML, and image documents — including scanned invoices, contracts, and forms –
  use GLM-OCR
- For all use cases involving search, recommendation, clustering, or knowledge graph building
  use Qwen3-Embedding-8B
- For any audio transcription or voice-command needs
  use whisper-large-v3-turbo
- To improve RAG retrieval precision by adding it as a second-pass reranker after vector search
  use Qwen3-VL-Reranker-2B