Mistral-Medium-3.5-128B

Description

"Mistral-Medium-3.5-128B" is a 128-billion-parameter frontier language model by Mistral AI. It supports text and tool calling over a 256,000-token context window and uses EAGLE speculative decoding for fast inference.

It supports and is suitable for:

Text generation within a chat completion (text to text)
Tool-calling for agentic workflows
Long-context document analysis and summarisation
Multilingual tasks — strong coverage of European languages

The following limitations apply:

Maximum context length: 256,000 tokens
No audio support

API usage

Chat

Python
JavaScript

from openai import OpenAI

client = OpenAI(
    base_url="https://llm.aihosting.mittwald.de/v1",
    api_key="sk-your-api-key-here",
)

response = client.chat.completions.create(
    model="Mistral-Medium-3.5-128B",
    messages=[{"role": "user", "content": "Explain the difference between TCP and UDP."}],
    temperature=0.7,
    top_p=0.9,
    max_tokens=1024,
)

print(response.choices[0].message.content)

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://llm.aihosting.mittwald.de/v1",
  apiKey: "sk-your-api-key-here",
});

const response = await client.chat.completions.create({
  model: "Mistral-Medium-3.5-128B",
  messages: [{ role: "user", content: "Explain the difference between TCP and UDP." }],
  temperature: 0.7,
  top_p: 0.9,
  max_tokens: 1024,
});

console.log(response.choices[0].message.content);

Tool calling (function calling)

from openai import OpenAI

client = OpenAI(base_url="https://llm.aihosting.mittwald.de/v1")

tools = [{
    "type": "function",
    "function": {
        "name": "get_weather",
        "description": "Get current weather for a city",
        "parameters": {
            "type": "object",
            "properties": {"city": {"type": "string"}},
            "required": ["city"],
        },
    },
}]

response = client.chat.completions.create(
    model="Mistral-Medium-3.5-128B",
    messages=[{"role": "user", "content": "What is the weather in Paris?"}],
    tools=tools,
    tool_choice="auto",
    temperature=0.2,
)

if response.choices[0].message.tool_calls:
    call = response.choices[0].message.tool_calls[0]
    print(f"Function: {call.function.name}")
    print(f"Arguments: {call.function.arguments}")

Recommended inference parameters

General chat

Parameter	Value
`temperature`	0.7
`top_p`	1.0
`max_tokens`	1024–8192 depending on task

Tool calling / structured output

Parameter	Value
`temperature`	0.0–0.3
`top_p`	1.0

Terms of use and licensing

The general terms of use apply. The model is provided by Mistral AI under the Apache 2.0 License, and reuse of the generated content is not subject to any additional restrictions.

Description​

API usage​

Chat​

Tool calling (function calling)​

Recommended inference parameters​

General chat​

Tool calling / structured output​

Terms of use and licensing​