Qwen3.6-35B-A3B-FP8
Description
"Qwen3.6-35B-A3B-FP8" is a Mixture-of-Experts (MoE) language model by Alibaba with 35 billion total parameters, of which approximately 3 billion are active per forward pass. It is designed for efficient, high-quality chat and agentic workflows with reasoning and vision capabilities, suitable for long-document analysis and extended multi-turn conversations.
It supports and is suitable for:
- Text generation within a chat completion (text to text)
- Tool-calling for agentic workflows
- Image understanding (vision)
- Thinking / reasoning for step-by-step problem solving
- Processing long documents and extended contexts
The following limitations apply:
- Maximum context length: 262,144 tokens
- Thinking mode requires at least 128,000 tokens of remaining context to function properly
- Images must be submitted as Base64-encoded data URLs (no remote URLs)
Thinking mode is enabled by default. To disable it, pass "enable_thinking": false in your API request's extra body parameters.
Recommended inference parameters
The model has different recommended settings depending on the use case. Do not use greedy decoding (temperature 0) - it can cause performance degradation and repetitions.
Thinking mode (default)
General tasks:
| Parameter | Value |
|---|---|
temperature | 1.0 |
top_p | 0.95 |
top_k | 20 |
presence_penalty | 1.5 |
Precise coding / web development:
| Parameter | Value |
|---|---|
temperature | 0.6 |
top_p | 0.95 |
top_k | 20 |
presence_penalty | 0.0 |
Non-thinking mode (enable_thinking: false)
General tasks:
| Parameter | Value |
|---|---|
temperature | 0.7 |
top_p | 0.8 |
top_k | 20 |
presence_penalty | 1.5 |
Reasoning / math / complex problem solving:
| Parameter | Value |
|---|---|
temperature | 1.0 |
top_p | 1.0 |
top_k | 40 |
presence_penalty | 2.0 |
Output length
Set max_tokens according to task complexity to control cost and latency:
| Task type | Recommended max_tokens |
|---|---|
| Standard queries | 32,768 |
| Complex problems (math, programming contests) | 81,920 |
Tips for specific tasks
Vision (image to text)
Always disable thinking mode for vision tasks - thinking adds latency without improving image understanding:
extra_body={"chat_template_kwargs": {"enable_thinking": False}}
Recommended parameters for vision:
| Parameter | Value |
|---|---|
temperature | 0.7 |
top_p | 0.8 |
top_k | 20 |
max_tokens | 512–2048 depending on task |
For accurate text extraction (OCR) or data reading, use temperature=0.1 instead.
Always resize images to a maximum of 1024 px on the longest edge before encoding as Base64 - large images significantly increase time to first token (TTFT). The first request for a new image will have a longer TTFT while the image encoder warms up; subsequent requests with the same image benefit from caching. See the Python examples or JavaScript examples for a ready-to-use helper.
Math problems
For best results on mathematical tasks, append the following instruction to your prompt:
Please reason step by step, and put your final answer within \boxed{}.
Multiple-choice questions
To get consistent, parseable output on multiple-choice tasks, add this to your prompt:
Please show your choice in the 'answer' field with only the choice letter, e.g., 'answer': 'C'.
Terms of use and licensing
The general terms of use apply. The model is provided by Alibaba under the Apache 2.0 License, and reuse of the generated content is not subject to any additional restrictions.