MCP Tool (Responses API)
Call remote Model Context Protocol servers from `/v1/responses` — either passthrough to OpenAI/Azure or run the client loop in the gateway via `rmcp`.
Hadrian's /v1/responses accepts OpenAI's mcp tool — {"type": "mcp", server_url, server_label, authorization, ...} — so a model can call tools exposed by any remote Model Context Protocol server (Atlassian, Notion, GitHub, HuggingFace, Vercel, …).
This page describes the server-side mcp tool on /v1/responses. For the browser-based MCP
client in the chat UI, see MCP Integration. For coding agents bridged via
MCP, see Agents via MCP.
Modes
| Mode | Where the MCP loop runs | Provider support |
|---|---|---|
passthrough_openai | OpenAI / Azure OpenAI servers | OpenAI, Azure OpenAI only |
hadrian_hosted | Hadrian gateway (via the rmcp crate) | All providers |
Both modes ship; pick one by setting [features.mcp].mode. passthrough_openai is zero-cost forwarding and gives you OpenAI's first-party MCP optimizations. hadrian_hosted makes MCP work behind any provider Hadrian supports (Anthropic, Bedrock, Vertex, Test) and gives the gateway visibility into every call — the tradeoff is one extra network hop per tools/call.
Enabling the feature
Add an [features.mcp] section to hadrian.toml:
[features.mcp]
enabled = true
mode = "passthrough_openai"
# Optional: restrict which remote MCP servers callers may target.
# Omit to accept any URL the caller supplies (the caller already
# controls Authorization, so this is defense-in-depth).
# allowed_server_urls = ["https://mcp.atlassian.com/v1/mcp"]
# Default false. Flip to true only when the upstream is OpenAI/Azure
# AND the connector_id is known to work. Self-hosted gateways can't
# reach OpenAI's first-party connector registry.
allow_connector_ids = false
# Default upper bound (seconds) on a single tools/call under
# hadrian_hosted. rmcp/reqwest apply no timeout of their own, so without
# this an unresponsive MCP server would hang the response. Override per
# tool with the `call_timeout_secs` field on the mcp tool entry.
call_timeout_secs = 300Wire format
A /v1/responses request declares an mcp tool entry alongside any other tools:
{
"model": "gpt-5.2",
"input": "What's the status of issue ENG-1234?",
"tools": [
{
"type": "mcp",
"server_label": "atlassian",
"server_url": "https://mcp.atlassian.com/v1/mcp",
"authorization": "Bearer ya29...",
"require_approval": "never",
"allowed_tools": ["jira_issue_get", "jira_search"]
}
]
}The caller obtains the bearer token out-of-band (their own OAuth flow against Atlassian / GitHub / etc.). Hadrian forwards the authorization field verbatim and never persists it — clients must include it on every request.
Field reference
| Field | Type | Required | Notes |
|---|---|---|---|
type | "mcp" | yes | |
server_label | string | yes | Stable identifier surfaced in mcp_list_tools / mcp_call items. |
server_url | string | one of | URL of the remote MCP server (Streamable HTTP). Mutually exclusive with connector_id. |
connector_id | string | one of | OpenAI first-party connector id (e.g. connector_googlecalendar). Requires allow_connector_ids = true. |
server_description | string | Human-readable description surfaced to the model. | |
authorization | string | Bearer or OAuth access token. Caller-supplied, never persisted. | |
headers | Record<string, string> | Extra HTTP headers sent with every JSON-RPC call (region / workspace selectors). | |
require_approval | "always" | "never" | object | Object form mirrors OpenAI's MCPToolApprovalFilter: { "always": { "tool_names": ["x"] }, "never": { "tool_names": ["y"] } } — gate the tools under always, exempt those under never. | |
allowed_tools | string[] or object | Whitelist of tool names. Object form: { tool_names: ["..."] }. | |
defer_loading | boolean | Discover this server's tools via tool search rather than loading them all into the prompt. Under hadrian_hosted, Hadrian runs the search locally (works behind any provider). | |
defer_loading_passthrough | boolean | Hadrian extension. With defer_loading, forward the flag to the upstream's native tool search instead of running Hadrian-side search. OpenAI/Azure only; rejected (400 mcp_defer_loading_passthrough_unsupported) on other providers. | |
call_timeout_secs | integer | Hadrian extension. Upper bound, in seconds, on a single tools/call round-trip under hadrian_hosted. Overrides the [features.mcp].call_timeout_secs deployment default (300s). On expiry the mcp_call terminates with status="incomplete" and a timeout error. Ignored under passthrough_openai. |
Item types
Under passthrough_openai, OpenAI emits the canonical item lifecycle on the response stream:
mcp_list_tools— snapshot of tools the model could call against the server. Surfaceserrorinline whentools/listfails.mcp_call— model-initiated invocation. Carriesname,argumentsJSON string,status,output/error(inlined per the OpenAI spec), andapproval_request_idwhen the call was gated.mcp_approval_request— emitted whenrequire_approvalgates a call.mcp_approval_response— caller-supplied input item that resumes a parked call:{ "type": "mcp_approval_response", "approval_request_id": "mcpr_...", "approve": true, "reason": "optional rationale" }.tool_search_call/tool_search_output— emitted when tool search runs for adefer_loadingserver. Thetool_search_outputcarries thetools[]the search surfaced.
Hadrian recognizes all of these and round-trips them through the Responses-API pipeline.
hadrian_hosted mode
When mode = "hadrian_hosted", Hadrian itself runs the MCP client loop using the official rmcp crate. On request entry, the gateway:
- Connects to each MCP server declared on the request (Streamable HTTP, caller-supplied bearer token).
- Calls
tools/listand caches the catalog for 60 seconds. - Rewrites every
{"type": "mcp", server_label, ...}entry into N function tools namedmcp_<server_label>__<tool_name>. The model sees a flat list of function tools. - When the model calls one of those function tools, Hadrian's
McpExecutorintercepts it, looks up the right pooled MCP client, and forwards thetools/call. - The result is inlined onto the
mcp_callitem'soutput(orerror) field on the response stream and folded back as afunction_call_outputitem the model reads on the next turn.
The same code path runs for every provider — Anthropic, Bedrock, Vertex, OpenAI, Azure, Test — so any tool-using model can drive it. Connections are pooled per (server_url, auth_hash) so chained calls in one response don't pay the initialize round-trip more than once.
Tool name handling. The server label is sanitized to fit OpenAI's [A-Za-z0-9_-]{1,64} function-name regex (My Co/Linear becomes My_Co_Linear); the tool name is taken verbatim from tools/list so the round-trip back to the MCP server is exact. Tools whose names don't match the regex (my.tool, non-ASCII) are skipped at rewrite time with a warning.
Bad-gateway errors. If tools/list fails (server unreachable, 5xx, TLS error) the request returns HTTP 502 with error_code = "mcp_list_tools_failed" and the underlying message — clients should retry with backoff. 401 errors from the MCP server are surfaced verbatim; the caller is expected to refresh their token and retry.
Tool search (deferred tools)
OpenAI's defer_loading flag means "discover this tool via tool search rather than loading its definition into the prompt" — useful when a server exposes dozens of tools and dumping every schema into context would be wasteful. Under passthrough_openai the flag is forwarded verbatim and OpenAI runs its native tool search.
Under hadrian_hosted, Hadrian runs the tool search itself, so deferral works behind every provider — including OpenAI-spec-compatible providers that don't implement the native tool_search tool. When a request marks an mcp entry with defer_loading: true:
- Hadrian fetches the catalog (as always) but keeps the per-tool definitions out of the prompt.
- It injects a single
tool_searchfunction tool listing the searchable servers. - When the model calls
tool_searchwith aquery, Hadrian ranks the catalog locally, emitstool_search_call/tool_search_outputitems, and injects the matched tool definitions into the next turn so the model can call them.
{
"model": "claude-sonnet-4-6",
"input": "Find and read issue ENG-1234",
"tools": [
{
"type": "mcp",
"server_label": "atlassian",
"server_url": "https://mcp.atlassian.com/v1/mcp",
"authorization": "Bearer ya29...",
"defer_loading": true
}
]
}Ranking
The ranking strategy is set by [features.mcp.tool_search] and can be overridden per request via a tool_search tool entry's Hadrian-extension ranker field (request value wins):
| Strategy | Behavior |
|---|---|
hybrid | Default. Fuses semantic + lexical relevance (Reciprocal Rank Fusion). |
semantic | Embedding cosine similarity only. |
lexical | Token/substring overlap. No embedding provider required. |
[features.mcp.tool_search]
ranker = "hybrid" # hybrid | semantic | lexical
max_results = 20 # tools returned per search
score_threshold = 0.0 # minimum relevance score
rrf_k = 60 # RRF smoothing constant (hybrid)
# Embedding config for semantic/hybrid. Falls back to
# [features.file_search.embedding] then the semantic-cache embedding config.
[features.mcp.tool_search.embedding]
provider = "openai"
model = "text-embedding-3-small"
dimensions = 1536Semantic and hybrid ranking need a resolvable embedding provider. If none resolves, a hybrid default automatically falls back to lexical (logged), so the feature keeps working. An explicit per-request ranker: "semantic" on a deployment with no embedding provider is a hard error — HTTP 400 with error_code = "tool_search_ranker_unavailable".
To opt out of Hadrian-side search and use the upstream's native tool search instead, set defer_loading_passthrough: true (OpenAI/Azure only).
Validation errors
The gateway validates the mcp tool entry before dispatching the request. Failures return HTTP 400 with a stable error_code:
| Error code | Cause |
|---|---|
mcp_disabled | A request includes an mcp tool but [features.mcp].enabled = false (or the section is missing). |
mcp_invalid_target | Neither server_url nor connector_id is set, or both are. |
mcp_connector_id_not_allowed | connector_id is used but [features.mcp].allow_connector_ids = false, or mode = hadrian_hosted (which can't reach OpenAI's first-party connector registry). |
mcp_server_url_not_allowed | server_url is not in [features.mcp].allowed_server_urls. |
mcp_passthrough_unsupported_provider | mode = passthrough_openai but the resolved provider is not OpenAI/Azure (Anthropic, Bedrock, …). |
mcp_hadrian_hosted_not_implemented | mode = hadrian_hosted but the gateway was built without the mcp cargo feature (e.g. tiny / minimal profiles). |
And the approval-resume errors (HTTP 400 for caller-shape problems, 502 for upstream failures):
| Error code | Status | Cause |
|---|---|---|
mcp_resume_missing_tool_binding | 400 | An mcp_approval_response with approve: true arrived but the request omits the mcp tool entry for the parked server. |
mcp_resume_call_failed | 502 | Resumed call to the upstream MCP server failed (network, 5xx, 401). |
mcp_resume_repo_error | 502 | Approvals-table lookup or delete failed. |
And HTTP 502 from the upstream MCP dependency during the rewrite:
| Error code | Cause |
|---|---|
mcp_list_tools_failed | hadrian_hosted rewrite couldn't reach the remote MCP server's tools/list endpoint. |
mcp_duplicate_server_label | Two mcp tool entries on one request share a server_label; exactly one per label is allowed. |
mcp_missing_server_url | hadrian_hosted requires server_url on every mcp tool entry (connector_id is rejected). |
OpenAI connectors (connector_id)
OpenAI's API exposes a curated set of first-party connectors (Dropbox, Gmail, Google Calendar, Google Drive, Microsoft Teams, Outlook Email, Outlook Calendar, SharePoint). These resolve through OpenAI's internal connector registry — there is no public endpoint a self-hosted gateway can call to enumerate, validate, or invoke them. As a result, Hadrian deliberately does not ship a per-connector allowlist: under passthrough_openai the connector_id is forwarded verbatim to OpenAI/Azure, and under hadrian_hosted it's rejected outright (mcp_connector_id_not_allowed) because the gateway can't reach the registry. Operators get a single coarse switch — allow_connector_ids — to admit or refuse the entire feature.
If you need fine-grained gating, host the relevant service's MCP endpoint yourself (most providers, including the eight above, expose public MCP servers) and use server_url with [features.mcp].allowed_server_urls instead.
Rate limiting
Hadrian's standard request- and token-rate limits apply to /v1/responses and therefore bound MCP traffic transitively. Beyond that, there is currently no per-MCP-server call cap — once a request is admitted, the model can chain tools/call invocations up to the global [features.server_tools].max_iterations ceiling (default 30 iterations). The agent loop is the hard backstop; runaway calls terminate when the iteration budget is exhausted.
This matches OpenAI's documented behavior — the spec does not define a per-tool or per-server call cap on the Responses API side. If you need tighter bounds (e.g. "no more than 5 tools/call per response against atlassian"), the recommended approach today is:
- Lower
[features.server_tools].max_iterationsfor the deployment. - Use
require_approval = "always"on sensitive servers so each call goes through the approval gate. - Track call volume out-of-band via the persisted
mcp_callitems on the response store.
A dedicated per-server cap is a candidate enhancement; until OpenAI publishes a matching field on the mcp tool, it would be a Hadrian-only extension.
Authentication
Hadrian does not run the OAuth dance for the remote MCP server. The caller is responsible for:
- Registering an OAuth client with the MCP provider (Atlassian developer console, etc.).
- Completing the authorization-code flow to obtain an access token.
- Refreshing the token before expiry and re-sending it on each request.
This mirrors OpenAI's own contract — the authorization field is opaque from the API's perspective. The gateway adds no value-add on top (no operator-pinned tokens, no gateway-side refresh).
Approval flow
require_approval defaults to "always". Matching OpenAI's spec, an mcp tool entry with no
require_approval field gates every call. Under hadrian_hosted the approval gate fails
closed: if the gateway can't park the call it refuses to run it and returns a failed mcp_call
instead. Parking requires all of:
- a configured database (
mcp_pending_approvalslives in Postgres/SQLite — thetinyprofile has no DB), store: trueon the request (a parked call must be persisted so it can be resumed), and- an authenticated request with an organization scope (anonymous requests have nowhere to park).
So a "just add an mcp tool" request with none of the above will see every call fail with an
explanatory error. For unattended / non-sensitive servers, set require_approval: "never" (or list
the safe tools under the object form's never). Only opt into gating when you have a DB, send
store: true, and have a UI ready to collect the mcp_approval_response.
When require_approval matches a call, the upstream emits an mcp_approval_request item. The next /v1/responses request must carry an mcp_approval_response input item with the matching approval_request_id:
{
"input": [
{
"type": "mcp_approval_response",
"approval_request_id": "mcpr_abc123",
"approve": true
}
],
"previous_response_id": "resp_xyz",
"tools": [{ "type": "mcp", "server_label": "atlassian", "server_url": "..." }]
}Approval persistence
require_approval is honored under both modes:
passthrough_openai— OpenAI / Azure runs the approval loop itself.hadrian_hosted— Hadrian parks gated calls in themcp_pending_approvalstable (Postgres or SQLite, mirrored). Persistence survives replica restarts and lets a user click "approve" minutes after the gateway emitted the request. The caller resumes by sending{"type": "mcp_approval_response", "approval_request_id": "mcpr_...", "approve": true|false}as an input item on a follow-up request (typically withprevious_response_idchained back); Hadrian runs the call (on approve) or surfaces a refusal (on deny) and folds the result back as afunction_call_outputthe model sees on its next turn.
Resuming an approved call
The resume request must include the matching mcp tool entry with the authorization header set. The gateway never persists OAuth tokens, so it pulls the bearer back off the live request's tools[] block at resume time. Concretely:
{
"previous_response_id": "resp_xyz",
"tools": [
{
"type": "mcp",
"server_label": "atlassian",
"server_url": "https://mcp.atlassian.com/v1/mcp",
"authorization": "Bearer ya29..."
}
],
"input": [
{ "type": "mcp_approval_response", "approval_request_id": "mcpr_a1b2c3", "approve": true }
]
}If the mcp tool entry for the parked call's server_label is missing on the resume request, the gateway returns HTTP 400 with error_code = "mcp_resume_missing_tool_binding" and a message naming the server. Refusals (approve: false) don't require the tool entry — they short-circuit without hitting the upstream.
When the gateway runs without a database, require_approval under hadrian_hosted degrades to warn-and-run — the operator log shows a clear "persistence unavailable" message and every call executes. Enable a database to gate approvals.