Skip to main content

OpenAI API compatibility

The dedicated AI Hosting endpoint is OpenAI-compatible. You can point any OpenAI SDK or library at it by changing two values — the base URL and the API key. This page documents exactly what is supported, what is not, and where the behavior differs from OpenAI's hosted service.

Switching from OpenAI

# Before
client = OpenAI(api_key="sk-...")

# After
client = OpenAI(
api_key="YOUR_DEDICATED_API_KEY",
base_url="https://your-company.llm.aihosting.mittwald.de/v1",
)

The model name also changes. Use the ID returned by /v1/models instead of an OpenAI model name like gpt-4o.

Supported endpoints

EndpointSupported
GET /v1/models
POST /v1/chat/completions
POST /v1/completions✅ (legacy text completions)
POST /v1/responses
POST /v1/embeddings❌ — this endpoint serves a generative model, not an embedding model
Assistants API (/v1/assistants, /v1/threads, …)
Batch API (/v1/batches)
Fine-tuning API (/v1/fine_tuning)
Files API (/v1/files)
Moderations API (/v1/moderations)
Audio / image endpoints

/v1/chat/completions parameter support

ParameterSupportNotes
modelUse the model ID from /v1/models
messages
streamSee Streaming
temperature, top_p
max_tokens / max_completion_tokens
stop
nMultiple completions per request
presence_penalty, frequency_penalty
logprobs, top_logprobs
tools, tool_choiceSee Tool calling
parallel_tool_callsfalse limits to one tool call; true (default) allows multiple
response_formatSee Structured outputs
seed⚠️Accepted and passed through — reproducibility is best-effort due to GPU non-determinism
user⚠️Accepted but ignored
logit_biasSupported; token IDs outside the model vocabulary return a validation error
stream_optionsSupported when stream: true; passing it without streaming returns a validation error. include_usage: true appends a usage chunk at the end of the stream

Structured outputs

Both JSON mode and JSON schema mode are supported.

JSON object mode — constrains output to valid JSON:

{
"model": "YOUR_MODEL_ID",
"messages": [{"role": "user", "content": "Return a JSON object with keys name and age."}],
"response_format": {"type": "json_object"}
}

JSON schema mode — constrains output to a specific schema:

{
"model": "YOUR_MODEL_ID",
"messages": [{"role": "user", "content": "Extract the person's name and age."}],
"response_format": {
"type": "json_schema",
"json_schema": {
"name": "person",
"strict": true,
"schema": {
"type": "object",
"properties": {
"name": {"type": "string"},
"age": {"type": "integer"}
},
"required": ["name", "age"],
"additionalProperties": false
}
}
}
}

Tool calling

Tool calling (function calling) follows the OpenAI format. Pass your tools in the tools array and set tool_choice to control behavior.

from openai import OpenAI

client = OpenAI(
api_key="YOUR_API_KEY",
base_url="https://your-company.llm.aihosting.mittwald.de/v1",
)

tools = [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get the current weather for a location.",
"parameters": {
"type": "object",
"properties": {
"location": {"type": "string", "description": "City and country"},
},
"required": ["location"],
},
},
}
]

response = client.chat.completions.create(
model="YOUR_MODEL_ID",
messages=[{"role": "user", "content": "What is the weather in Berlin?"}],
tools=tools,
tool_choice="auto",
)

tool_calls = response.choices[0].message.tool_calls

parallel_tool_calls=False restricts the model to at most one tool call per turn, which simplifies handling in multi-step agent loops.

Configuring client timeouts

LLM requests with long inputs or outputs can take 60 seconds or more. Many HTTP clients and frameworks default to much shorter timeouts. Set your client timeout explicitly to avoid spurious failures.

import httpx
from openai import OpenAI

client = OpenAI(
api_key="YOUR_API_KEY",
base_url="https://your-company.llm.aihosting.mittwald.de/v1",
timeout=httpx.Timeout(300.0, connect=10.0),
)

A 300-second (5-minute) read timeout covers most inference workloads. Increase for very long contexts or high-concurrency conditions.