Qwen3.5-0.8B

Description

"Qwen3.5-0.8B" is a compact language model by Alibaba with 0.8 billion parameters. Despite its small size it supports a 262,144-token context window and an optional thinking / reasoning mode, making it the most cost-efficient model in the mittwald AI Hosting catalogue.

It supports and is suitable for:

Text generation within a chat completion (text to text)
Tool-calling for agentic workflows
Thinking / reasoning for step-by-step problem solving (opt-in)
High-throughput, latency-sensitive pipelines
Batch processing — classification, routing, summarisation, extraction

The following limitations apply:

Maximum context length: 262,144 tokens
No image / vision support
Response quality and reasoning depth are lower than larger models
Thinking mode is disabled by default — opt in per request via chat_template_kwargs

Thinking mode

Thinking mode is off by default on this model — the opposite of Qwen3.5-122B-A10B-FP8. Enable it selectively for tasks that benefit from chain-of-thought reasoning:

Using this model from n8n? The built-in OpenAI Chat Model node can't set chat_template_kwargs — see Reasoning models and thinking mode for a community-node workaround.

Python
JavaScript

from openai import OpenAI

client = OpenAI(
    base_url="https://llm.aihosting.mittwald.de/v1",
    api_key="sk-your-api-key-here",
)

response = client.chat.completions.create(
    model="Qwen3.5-0.8B",
    messages=[{"role": "user", "content": "Solve: if 3x + 7 = 22, what is x?"}],
    temperature=0.6,
    top_k=20,
    max_tokens=8192,
    extra_body={
        "chat_template_kwargs": {"enable_thinking": True},
    },
)

# Thinking response contains both fields:
print(response.choices[0].message.reasoning_content)  # chain-of-thought
print(response.choices[0].message.content)             # final answer

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://llm.aihosting.mittwald.de/v1",
  apiKey: "sk-your-api-key-here",
});

const response = await client.chat.completions.create({
  model: "Qwen3.5-0.8B",
  messages: [{ role: "user", content: "Solve: if 3x + 7 = 22, what is x?" }],
  temperature: 0.6,
  max_tokens: 8192,
  // @ts-ignore – vLLM extension
  chat_template_kwargs: { enable_thinking: true },
} as any);

console.log(response.choices[0].message.content);

When thinking is enabled the model returns two fields:

Field	Contents
`choices[0].message.reasoning_content`	Internal chain-of-thought
`choices[0].message.content`	Final answer

Recommended inference parameters

Default mode (thinking off)

General tasks:

Parameter	Value
`temperature`	1.0
`top_p`	1.0
`top_k`	20
`presence_penalty`	2.0

Do not use greedy decoding (temperature: 0) — it causes repetitions. presence_penalty above 1.5 can occasionally trigger language mixing on multilingual prompts.

Thinking mode (`enable_thinking: true`)

General tasks:

Parameter	Value
`temperature`	1.0
`top_p`	0.95
`top_k`	20
`presence_penalty`	1.5

Coding and precise tasks:

Parameter	Value
`temperature`	0.6
`top_p`	0.95
`top_k`	20
`presence_penalty`	0.0

Output length

Task type	Recommended `max_tokens`
Standard queries	32,768
Complex problems (math, step-by-step)	81,920

Tips for specific tasks

Tool calling

The model supports function calling using the Qwen3 XML format. Pass tools via the standard OpenAI tools parameter:

response = client.chat.completions.create(
    model="Qwen3.5-0.8B",
    messages=[{"role": "user", "content": "What is the weather in Berlin?"}],
    tools=[{
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get current weather for a city",
            "parameters": {
                "type": "object",
                "properties": {"city": {"type": "string"}},
                "required": ["city"],
            },
        },
    }],
    tool_choice="auto",
    temperature=0.2,
)

Use low temperature (0.1–0.3) for tool calls to reduce hallucinations.

Routing and classification

The small size makes this model ideal as a first-pass classifier that decides which larger model handles a request:

response = client.chat.completions.create(
    model="Qwen3.5-0.8B",
    messages=[
        {
            "role": "system",
            "content": (
                "Classify the following user message into exactly one category: "
                "SIMPLE_QA, CODE, MATH, IMAGE_TASK. Respond with only the category name."
            ),
        },
        {"role": "user", "content": user_message},
    ],
    temperature=0.1,
    max_tokens=10,
)
category = response.choices[0].message.content.strip()

Terms of use and licensing

The general terms of use apply. The model is provided by Alibaba under the Apache 2.0 License, and reuse of the generated content is not subject to any additional restrictions.

Description​

Thinking mode​

Recommended inference parameters​

Default mode (thinking off)​

Thinking mode (enable_thinking: true)​

Output length​

Tips for specific tasks​

Tool calling​

Routing and classification​

Terms of use and licensing​