End-to-end AI gateway setup with Bifrost

Bifrost is an AI gateway for routing, failover, and provider governance. In a dedicated setup, Bifrost sits between your apps and your dedicated mittwald endpoint.

For creative/web agencies, Bifrost helps centralize routing across client workloads, campaign spikes, and mixed provider landscapes.

When to use Bifrost vs LiteLLM

Use LiteLLM for virtual end-user key lifecycle, budgets, and spend views.
Use Bifrost for provider routing, fallback chains, and gateway-level policy.
Many teams run both: LiteLLM for customer keys, Bifrost for upstream routing.

Prerequisites

Docker installed
Dedicated endpoint URL and API key from mittwald

A-to-Z setup

1. Start Bifrost

user@local $ docker run -d \
  --name bifrost \
  -p 8080:8080 \
  -v "$(pwd)/data:/app/data" \
  maximhq/bifrost

Open http://localhost:8080.

2. Register your dedicated endpoint as a provider

Provider registration and key registration are two separate API calls.

Step 1 — register the provider (sets base URL and network config):

user@local $ curl --location 'http://localhost:8080/api/providers' \
  --header 'Content-Type: application/json' \
  --data '{
    "provider": "openai",
    "network_config": {
      "base_url": "https://your-company.llm.aihosting.mittwald.de/v1"
    }
  }'

Step 2 — add your API key to the provider:

user@local $ curl --location 'http://localhost:8080/api/providers/openai/keys' \
  --header 'Content-Type: application/json' \
  --data '{
    "name": "mittwald-dedicated",
    "value": "YOUR_DEDICATED_API_KEY",
    "models": ["*"],
    "weight": 1.0
  }'

3. Validate provider config in UI

In Model Providers, check:

Key is active
Model mapping is correct (* or explicit list)
Base URL points to your dedicated endpoint

4. Send traffic through Bifrost

curl
Python
JavaScript / TypeScript

user@local $ curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/YOUR_MODEL_ID",
    "messages": [
      {"role": "user", "content": "Hello"}
    ]
  }'

from openai import OpenAI

client = OpenAI(api_key="dummy", base_url="http://localhost:8080/openai/v1")

response = client.chat.completions.create(
    model="YOUR_MODEL_ID",
    messages=[{"role": "user", "content": "Hello"}],
)

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: "dummy",
  baseURL: "http://localhost:8080/openai/v1",
});

Routing to mittwald shared AI Hosting (in addition to dedicated)

If you also use mittwald shared AI Hosting, add it as another OpenAI-compatible provider.

Shared AI Hosting base URL:

https://llm.aihosting.mittwald.de/v1

Example — register and add a key:

user@local $ curl --location 'http://localhost:8080/api/providers' \
  --header 'Content-Type: application/json' \
  --data '{
    "provider": "mittwald-shared",
    "network_config": {
      "base_url": "https://llm.aihosting.mittwald.de/v1"
    },
    "custom_provider_config": {
      "base_provider_type": "openai"
    }
  }'

curl --location 'http://localhost:8080/api/providers/mittwald-shared/keys' \
  --header 'Content-Type: application/json' \
  --data '{
    "name": "mittwald-shared-key",
    "value": "YOUR_SHARED_API_KEY",
    "models": ["*"],
    "weight": 1.0
  }'

Routing rules: when to send traffic to dedicated vs shared

Bifrost routing decisions are typically built from:

model-to-key mapping (models per key)
weighted distribution (weight)
fallback ordering (primary provider, secondary provider)

Recommended rules:

Premium/contracted workloads -> dedicated provider key
burst or long-tail workloads -> shared provider key
fallback path -> dedicated first, shared second (or inverse, depending on SLO/cost goals)

Example model-specific split:

Key dedicated-key: models: ["YOUR_PREMIUM_MODEL_ID"]
Key shared-key: models: ["*"] (or explicit non-premium list)

This keeps high-value traffic on reserved capacity while still allowing scale-out via shared hosting.

You can also add other providers (for example Anthropic / Claude) and route specific workloads there.

Example additional provider:

user@local $ curl --location 'http://localhost:8080/api/providers' \
  --header 'Content-Type: application/json' \
  --data '{
    "provider": "anthropic"
  }'

curl --location 'http://localhost:8080/api/providers/anthropic/keys' \
  --header 'Content-Type: application/json' \
  --data '{
    "name": "claude-key",
    "value": "YOUR_ANTHROPIC_API_KEY",
    "models": ["claude-3-7-sonnet", "claude-4-sonnet"],
    "weight": 1.0
  }'

Agency-oriented routing scenarios

Typical agency setup in Germany:

multiple client websites/apps
mixed workloads (chatbot, content generation, backoffice automation)
changing traffic patterns (campaign launches, seasonal peaks)

Recommended policy examples:

keep premium client workloads on dedicated capacity
send non-critical or burst traffic to shared capacity
route specific tasks (for example copywriting/review workflows) to Claude if needed
keep one fallback route active for outage resilience

This lets agencies offer stable SLAs for key customers while keeping total operating cost predictable.

Complete multi-lane routing example

This example sets up three lanes in one sequence: dedicated mittwald capacity as the primary lane, shared mittwald capacity as a burst/fallback lane, and Anthropic as a separate lane for specific tasks. Copy and adapt the three provider registrations, then route by model name.

# Lane 1: dedicated mittwald endpoint (primary, reserved capacity)
user@local $ curl --location 'http://localhost:8080/api/providers' \
  --header 'Content-Type: application/json' \
  --data '{"provider": "openai", "network_config": {"base_url": "https://your-company.llm.aihosting.mittwald.de/v1"}}'

user@local $ curl --location 'http://localhost:8080/api/providers/openai/keys' \
  --header 'Content-Type: application/json' \
  --data '{"name": "mittwald-dedicated", "value": "YOUR_DEDICATED_API_KEY", "models": ["YOUR_PREMIUM_MODEL_ID"], "weight": 1.0}'

# Lane 2: shared mittwald endpoint (burst and long-tail workloads)
user@local $ curl --location 'http://localhost:8080/api/providers' \
  --header 'Content-Type: application/json' \
  --data '{"provider": "mittwald-shared", "network_config": {"base_url": "https://llm.aihosting.mittwald.de/v1"}, "custom_provider_config": {"base_provider_type": "openai"}}'

user@local $ curl --location 'http://localhost:8080/api/providers/mittwald-shared/keys' \
  --header 'Content-Type: application/json' \
  --data '{"name": "mittwald-shared-key", "value": "YOUR_SHARED_API_KEY", "models": ["*"], "weight": 1.0}'

# Lane 3: Anthropic (for specific task types routed by model name)
user@local $ curl --location 'http://localhost:8080/api/providers' \
  --header 'Content-Type: application/json' \
  --data '{"provider": "anthropic"}'

user@local $ curl --location 'http://localhost:8080/api/providers/anthropic/keys' \
  --header 'Content-Type: application/json' \
  --data '{"name": "claude-key", "value": "YOUR_ANTHROPIC_API_KEY", "models": ["claude-3-7-sonnet", "claude-4-sonnet"], "weight": 1.0}'

After registration, send requests to the correct lane by using the matching model name:

YOUR_PREMIUM_MODEL_ID → hits dedicated lane
any model matched by * on shared → hits shared lane
claude-3-7-sonnet or claude-4-sonnet → hits Anthropic lane

Fallback behavior (dedicated → shared) activates automatically if the primary provider key is unreachable. Adjust weight values to control traffic distribution.

Production configuration patterns

Provider-level failover and load split

Add multiple keys/providers
Use weight to distribute traffic
Use model-specific key mapping for premium/basic lanes

Self-hosted endpoint hardening

Use explicit model allowlists instead of * where possible
Configure network timeouts in provider network_config
Use internal DNS/FQDN for Kubernetes cross-namespace routing

Logging and observability

Enable provider-level request/response logging options in Bifrost only if required by your compliance profile.

Combining with LiteLLM (recommended for customer keys)

A robust pattern is:

Customer apps -> LiteLLM virtual keys
LiteLLM upstream -> Bifrost
Bifrost -> your dedicated endpoint

This gives:

Customer key lifecycle and spend controls
Gateway routing/fallback policies
Clear separation of concerns

When to use Bifrost vs LiteLLM​

Prerequisites​

A-to-Z setup​

1. Start Bifrost​

2. Register your dedicated endpoint as a provider​

3. Validate provider config in UI​

4. Send traffic through Bifrost​

Routing to mittwald shared AI Hosting (in addition to dedicated)​

Routing rules: when to send traffic to dedicated vs shared​

Agency-oriented routing scenarios​

Complete multi-lane routing example​

Production configuration patterns​

Provider-level failover and load split​

Self-hosted endpoint hardening​

Logging and observability​

Combining with LiteLLM (recommended for customer keys)​

References​