Providers
Configure LLM providers for Hadrian Gateway
Providers define how Hadrian connects to LLM services. Each provider has a unique name and specifies which API protocol to use.
Provider Types
| Type | Description | Streaming | Embeddings | Tools |
|---|---|---|---|---|
open_ai | OpenAI API and compatible services | Yes | Yes | Yes |
anthropic | Anthropic Claude API | Yes | No | Yes |
bedrock | AWS Bedrock | Yes | Yes (Titan) | Yes |
vertex | Google Vertex AI / Gemini | Yes | Yes | Yes |
azure_open_ai | Azure OpenAI Service | Yes | Yes | Yes |
test | Mock provider for testing | Yes | No | No |
OpenAI
Works with the native OpenAI API and any OpenAI-compatible endpoint.
[providers.openai]
type = "open_ai"
api_key = "${OPENAI_API_KEY}"
# Optional settings
organization = "org-xxx" # OpenAI organization ID
project = "proj-xxx" # OpenAI project ID
timeout_secs = 300 # Request timeout (default: 300)OpenAI-Compatible Providers
Use the open_ai type with a custom base_url for compatible services:
OpenRouter (access 100+ models):
[providers.openrouter]
type = "open_ai"
api_key = "${OPENROUTER_API_KEY}"
base_url = "https://openrouter.ai/api/v1"
# OpenRouter-specific headers
[providers.openrouter.headers]
HTTP-Referer = "https://myapp.example.com"
X-Title = "My Application"Ollama (local, no API key needed):
[providers.ollama]
type = "open_ai"
base_url = "http://localhost:11434/v1"Together AI:
[providers.together]
type = "open_ai"
api_key = "${TOGETHER_API_KEY}"
base_url = "https://api.together.xyz/v1"Groq:
[providers.groq]
type = "open_ai"
api_key = "${GROQ_API_KEY}"
base_url = "https://api.groq.com/openai/v1"vLLM (self-hosted):
[providers.vllm]
type = "open_ai"
base_url = "http://localhost:8000/v1"Anthropic
Direct access to Anthropic's Claude API.
[providers.anthropic]
type = "anthropic"
api_key = "${ANTHROPIC_API_KEY}"
# Optional settings
base_url = "https://api.anthropic.com" # Custom endpoint
timeout_secs = 300
default_model = "claude-sonnet-4-20250514"
default_max_tokens = 4096AWS Bedrock
Access Claude, Titan, Llama, and other models through AWS Bedrock.
[providers.bedrock]
type = "bedrock"
region = "us-east-1"Credential Options
Default credential chain (recommended):
Uses environment variables, ~/.aws/credentials, EC2 instance profile, or ECS task role automatically.
[providers.bedrock]
type = "bedrock"
region = "us-east-1"
# credentials.type = "default" is implicitStatic credentials:
[providers.bedrock]
type = "bedrock"
region = "us-east-1"
[providers.bedrock.credentials]
type = "static"
access_key_id = "${AWS_ACCESS_KEY_ID}"
secret_access_key = "${AWS_SECRET_ACCESS_KEY}"
session_token = "${AWS_SESSION_TOKEN}" # Optional, for temporary credentialsAssume role:
[providers.bedrock]
type = "bedrock"
region = "us-east-1"
[providers.bedrock.credentials]
type = "assume_role"
role_arn = "arn:aws:iam::123456789012:role/BedrockAccess"
external_id = "my-external-id" # Optional
session_name = "hadrian" # OptionalNamed profile:
[providers.bedrock]
type = "bedrock"
region = "us-east-1"
[providers.bedrock.credentials]
type = "profile"
name = "bedrock-profile"Cross-Region Inference
For multi-region routing with inference profiles:
[providers.bedrock]
type = "bedrock"
region = "us-east-1"
inference_profile_arn = "arn:aws:bedrock:us-east-1:123456789012:inference-profile/my-profile"Google Vertex AI
Access Gemini and other models through Google Cloud. Supports two authentication modes.
API Key Mode (Simple)
Best for getting started with Gemini:
[providers.gemini]
type = "vertex"
api_key = "${GOOGLE_API_KEY}"OAuth / ADC Mode (Full Features)
Required for Vertex AI features like Claude on Vertex or custom endpoints:
[providers.vertex]
type = "vertex"
project = "my-gcp-project"
region = "us-central1"
publisher = "google" # or "anthropic", "meta"Credential Options
Application Default Credentials (recommended):
Uses GOOGLE_APPLICATION_CREDENTIALS, gcloud CLI credentials, or compute metadata automatically.
[providers.vertex]
type = "vertex"
project = "my-gcp-project"
region = "us-central1"
# credentials.type = "default" is implicitService account key file:
[providers.vertex]
type = "vertex"
project = "my-gcp-project"
region = "us-central1"
[providers.vertex.credentials]
type = "service_account"
key_path = "/path/to/service-account.json"Service account JSON (from environment variable):
[providers.vertex]
type = "vertex"
project = "my-gcp-project"
region = "us-central1"
[providers.vertex.credentials]
type = "service_account_json"
json = "${GCP_SERVICE_ACCOUNT_JSON}"Claude on Vertex AI
Access Anthropic models through Vertex AI:
[providers.vertex-claude]
type = "vertex"
project = "my-gcp-project"
region = "us-east5" # Claude available in specific regions
publisher = "anthropic"Azure OpenAI
Access OpenAI models through Azure with deployment-based routing.
[providers.azure]
type = "azure_open_ai"
resource_name = "my-openai-resource"
api_version = "2024-02-01"
[providers.azure.auth]
type = "api_key"
api_key = "${AZURE_OPENAI_API_KEY}"
# Map deployments to model names for routing
[providers.azure.deployments.gpt4-deployment]
model = "gpt-4"
[providers.azure.deployments.gpt35-deployment]
model = "gpt-3.5-turbo"
[providers.azure.deployments.embedding-deployment]
model = "text-embedding-3-small"Authentication Options
API key:
[providers.azure.auth]
type = "api_key"
api_key = "${AZURE_OPENAI_API_KEY}"Azure AD / Entra ID:
[providers.azure.auth]
type = "azure_ad"
tenant_id = "${AZURE_TENANT_ID}"
client_id = "${AZURE_CLIENT_ID}"
client_secret = "${AZURE_CLIENT_SECRET}"Managed Identity (for Azure VMs/containers):
[providers.azure.auth]
type = "managed_identity"
client_id = "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" # Optional for user-assignedModel Aliases
Create shortcuts for long model names:
[providers.anthropic]
type = "anthropic"
api_key = "${ANTHROPIC_API_KEY}"
[providers.anthropic.model_aliases]
sonnet = "claude-sonnet-4-20250514"
haiku = "claude-3-5-haiku-20241022"
opus = "claude-opus-4-20250514"Now requests can use the alias:
curl http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-H "X-API-Key: gw_live_..." \
-d '{"model": "anthropic/sonnet", "messages": [...]}'Fallback Configuration
Provider Fallbacks
Try alternative providers when the primary fails with 5xx errors, timeouts, or circuit breaker trips:
[providers.anthropic]
type = "anthropic"
api_key = "${ANTHROPIC_API_KEY}"
fallback_providers = ["openai", "bedrock"]
[providers.openai]
type = "open_ai"
api_key = "${OPENAI_API_KEY}"
fallback_providers = ["anthropic"]
[providers.bedrock]
type = "bedrock"
region = "us-east-1"Model Fallbacks
Define alternative models to try before provider-level fallbacks. Useful for graceful degradation:
[providers.anthropic]
type = "anthropic"
api_key = "${ANTHROPIC_API_KEY}"
fallback_providers = ["openai"]
# Try cheaper models on the same provider first
[[providers.anthropic.model_fallbacks."claude-opus-4-20250514"]]
model = "claude-sonnet-4-20250514"
[[providers.anthropic.model_fallbacks."claude-opus-4-20250514"]]
model = "claude-3-5-haiku-20241022"
# Fall back to a different provider's model
[[providers.anthropic.model_fallbacks."claude-sonnet-4-20250514"]]
model = "gpt-4o"
provider = "openai"The fallback order for claude-opus-4-20250514 would be:
claude-sonnet-4-20250514(same provider)claude-3-5-haiku-20241022(same provider)openai/gpt-4o(provider fallback viafallback_providers)
Allowed Models
Restrict which models can be used through a provider:
[providers.openai]
type = "open_ai"
api_key = "${OPENAI_API_KEY}"
allowed_models = [
"gpt-4o",
"gpt-4o-mini",
"gpt-4-turbo",
"text-embedding-3-small",
"text-embedding-3-large"
]allowed_models is empty (default), all models are allowed.Model Catalog Provider
Override the automatic model catalog provider detection with an explicit catalog_provider:
[providers.custom_llm]
type = "open_ai"
api_key = "${CUSTOM_API_KEY}"
base_url = "https://my-llm-provider.example.com/v1"
catalog_provider = "openai" # Use OpenAI model metadata from models.devThis field maps your provider to a models.dev provider ID for capability and pricing enrichment. Use it when:
- You have a custom OpenAI-compatible endpoint that serves models from a known provider
- Auto-detection from the base URL doesn't match the correct provider
- You want to use a specific provider's metadata for pricing fallback
Common catalog provider IDs: openai, anthropic, google, mistral, deepseek, groq, together, openrouter, fireworks-ai, cerebras, cohere, perplexity.
Model Configuration
The models field provides per-model configuration for pricing, modalities, supported tasks, and metadata. This is essential for models not in the models.dev catalog (image generation, TTS, transcription) and for overriding catalog data.
Pricing
Pricing fields are specified directly alongside metadata:
[providers.openai.models."gpt-4o"]
input_per_1m_tokens = 2500000 # $2.50/1M input tokens (in microcents)
output_per_1m_tokens = 10000000 # $10/1M output tokensModalities and Tasks
Modalities describe what a model can accept and produce. Tasks specify which API endpoints the model supports, enabling the Studio UI to categorize models correctly.
| Task | API Endpoint | Studio Panel |
|---|---|---|
chat | /v1/chat/completions | Chat |
image_generation | /v1/images/generations | Images |
tts | /v1/audio/speech | Audio > Speak |
transcription | /v1/audio/transcriptions | Audio > Transcribe |
translation | /v1/audio/translations | Audio > Translate |
embedding | /v1/embeddings | — |
Image Generation Models
[providers.openai.models."dall-e-3"]
per_image = 40000 # $0.04/image (in microcents)
modalities = { input = ["text"], output = ["image"] }
tasks = ["image_generation"]
family = "dall-e"
[providers.openai.models."gpt-image-1"]
per_image = 11000
modalities = { input = ["text", "image"], output = ["image"] }
tasks = ["image_generation"]
family = "gpt-image"Text-to-Speech Models
[providers.openai.models."tts-1"]
per_1m_characters = 15000000 # $15/1M characters
modalities = { input = ["text"], output = ["audio"] }
tasks = ["tts"]
family = "tts"
[providers.openai.models."gpt-4o-mini-tts"]
input_per_1m_tokens = 600000
output_per_1m_tokens = 12000000
modalities = { input = ["text"], output = ["audio"] }
tasks = ["tts"]
family = "gpt-4o-mini-tts"Transcription and Translation Models
[providers.openai.models."whisper-1"]
per_second = 100 # $0.006/min
modalities = { input = ["audio"], output = ["text"] }
tasks = ["transcription", "translation"]
family = "whisper"Additional Metadata
You can also specify context length, max output tokens, capabilities, and open weights status:
[providers.openai.models."gpt-4o"]
input_per_1m_tokens = 2500000
output_per_1m_tokens = 10000000
context_length = 128000
max_output_tokens = 16384
family = "gpt-4o"
[providers.openai.models."gpt-4o".capabilities]
vision = true
reasoning = false
tool_call = true
structured_output = true
temperature = trueConfig metadata overrides catalog data. If the models.dev catalog has data for a model, config values take precedence for any field that is set.
Retry Configuration
Configure automatic retries for transient failures:
[providers.anthropic]
type = "anthropic"
api_key = "${ANTHROPIC_API_KEY}"
[providers.anthropic.retry]
max_attempts = 3 # Total attempts (default: 3)
initial_delay_ms = 1000 # First retry delay (default: 1000)
max_delay_ms = 30000 # Maximum delay (default: 30000)
backoff_multiplier = 2.0 # Exponential backoff factor (default: 2.0)Retries occur on:
- HTTP 429 (rate limited)
- HTTP 5xx (server errors)
- Connection timeouts
- Network errors
Circuit Breaker
Automatically disable unhealthy providers:
[providers.anthropic]
type = "anthropic"
api_key = "${ANTHROPIC_API_KEY}"
[providers.anthropic.circuit_breaker]
enabled = true
failure_threshold = 5 # Failures before opening (default: 5)
success_threshold = 2 # Successes before closing (default: 2)
timeout_secs = 30 # Time in open state (default: 30)States:
- Closed: Normal operation, requests pass through
- Open: Provider disabled, requests fail immediately or use fallback
- Half-Open: Testing if provider recovered
Health Checks
Proactive monitoring of provider availability:
[providers.anthropic]
type = "anthropic"
api_key = "${ANTHROPIC_API_KEY}"
[providers.anthropic.health_check]
enabled = true
interval_secs = 60 # Check frequency (default: 60)
timeout_secs = 10 # Health check timeout (default: 10)
model = "claude-3-5-haiku-20241022" # Model to use for checksHealth checks complement circuit breakers by detecting issues before user requests fail.
Default Provider
Set a default provider for requests that don't specify one:
[providers]
default_provider = "anthropic"
[providers.anthropic]
type = "anthropic"
api_key = "${ANTHROPIC_API_KEY}"
[providers.openai]
type = "open_ai"
api_key = "${OPENAI_API_KEY}"With this config:
{"model": "gpt-4o"}routes toopenai(model name implies provider){"model": "claude-sonnet-4-20250514"}routes toanthropic(default provider){"model": "openai/gpt-4o"}routes toopenai(explicit)
Complete Example
A production configuration with multiple providers:
[providers]
default_provider = "anthropic"
# Primary provider with fallbacks
[providers.anthropic]
type = "anthropic"
api_key = "${ANTHROPIC_API_KEY}"
timeout_secs = 120
fallback_providers = ["openai", "bedrock"]
[providers.anthropic.model_aliases]
sonnet = "claude-sonnet-4-20250514"
haiku = "claude-3-5-haiku-20241022"
[[providers.anthropic.model_fallbacks."claude-sonnet-4-20250514"]]
model = "claude-3-5-haiku-20241022"
[providers.anthropic.retry]
max_attempts = 3
[providers.anthropic.circuit_breaker]
enabled = true
failure_threshold = 5
[providers.anthropic.health_check]
enabled = true
interval_secs = 60
model = "claude-3-5-haiku-20241022"
# OpenAI as fallback
[providers.openai]
type = "open_ai"
api_key = "${OPENAI_API_KEY}"
timeout_secs = 120
[providers.openai.model_aliases]
gpt4 = "gpt-4o"
[providers.openai.retry]
max_attempts = 3
[providers.openai.circuit_breaker]
enabled = true
# Bedrock as secondary fallback
[providers.bedrock]
type = "bedrock"
region = "us-east-1"
[providers.bedrock.retry]
max_attempts = 2
[providers.bedrock.circuit_breaker]
enabled = true
# Local Ollama for development
[providers.ollama]
type = "open_ai"
base_url = "http://localhost:11434/v1"
timeout_secs = 300