Skip to main content

Available models

We currently offer the following models, which may change or expand over time. Each is described along with model-specific parameters.

Model NameTypeModalitiesContext (Tokens)License
gpt-oss-120bChat + reasoningText, tool-calling131,072Apache 2.0
Qwen3.5-0.8BChat + reasoningText, tool-calling262,144Apache 2.0
Ministral-3-14B-Instruct-2512Chat + visionText, image, tool-calling262,144Apache 2.0
Mistral-Medium-3.5-128BChat + visionText, image, tool-calling256,000Apache 2.0
Qwen3.5-122B-A10B-FP8Chat + reasoning + visionText, image, tool-calling245,760Apache 2.0
Qwen3.6-35B-A3B-FP8Chat + reasoning + visionText, image, tool-calling262,144Apache 2.0
GLM-OCRDocument OCRPDF, DOCX, PPTX, XLSX, HTML, SVG, image to text131,072MIT
Qwen3-Embedding-8BEmbeddingText to vector32,768Apache 2.0
Qwen3-VL-Reranker-2BRerankingText, image to score32,768Apache 2.0
whisper-large-v3-turboSpeech-to-TextAudio to textN/A (audio-based)MIT

Picking models​

  • For complex text-centric workloads and advanced automations when precision and vast knowledge are required
    use gpt-oss-120b.
  • For high-throughput, cost-sensitive tasks that don't require vision (e.g. simple Q&A, routing, classification, and batch processing)
    use Qwen3.5-0.8B.
  • For complex reasoning, multilingual tasks, or vision workloads where a large frontier model is required
    use Mistral-Medium-3.5-128B.
  • For broad, scalable, cost-conscious chat and basic multimodal (text + image) workflows
    use Ministral-3-14B-Instruct-2512.
  • For large-scale reasoning and vision tasks where high model capacity is required
    use Qwen3.5-122B-A10B-FP8.
  • For workloads that require long context windows with reasoning and vision support at lower cost
    use Qwen3.6-35B-A3B-FP8.
  • Special-purpose applications
    • For extracting text from PDF, DOCX, PPTX, XLSX, HTML, and image documents β€” including scanned invoices, contracts, and forms –
      use GLM-OCR
    • For all use cases involving search, recommendation, clustering, or knowledge graph building
      use Qwen3-Embedding-8B
    • For any audio transcription or voice-command needs
      use whisper-large-v3-turbo
    • To improve RAG retrieval precision by adding it as a second-pass reranker after vector search
      use Qwen3-VL-Reranker-2B

For more details and additional tips have a look at the usage examples and guides.

Please have a look at the following pages to gather more information about a specific model: