MCP: Using Streamable HTTP for Real-Time AI Tool Use
If you’ve been building MCP integrations, you probably noticed the transport layer changed underneath you. The March 2025 spec update—version 2025-03-26—deprecated the HTTP+SSE transport in favor of Streamable HTTP. The change sounds minor. Swapping one HTTP-based approach for another. But the implications ripple through everything from auth middleware to deployment topology.
Let me walk through what actually changed, why it matters, and what you need to know if you’re building or maintaining MCP servers.
What MCP Is (Quick Context) #
For anyone coming in cold: MCP stands for Model Context Protocol. Anthropic introduced it as a standardized way for AI models to interact with external tools and data sources. The analogy that stuck is “USB-C for AI tools”—a single interface that connects any AI system to any external service, regardless of vendor.
Before MCP, every AI tool integration was bespoke. Want your LLM to query a database? Write a custom integration. Want it to call a REST API? Another custom integration. Want it to read files from cloud storage? Yet another one. MCP standardizes the protocol so tools expose capabilities in a uniform way, and AI clients consume them through a uniform interface.
The protocol defines three core primitives: tools (functions the AI can call), resources (data the AI can read), and prompts (templates for specific interactions). An MCP server exposes some combination of these, and an MCP client connects to them.
The transport layer—how the client and server actually communicate—is where the March update made its mark.
The Old Way: HTTP+SSE #
The original MCP transport used a combination of regular HTTP requests and Server-Sent Events (SSE). The client sent requests via HTTP POST, and the server could push updates back via an SSE connection.
This worked, but it had problems.
SSE is inherently one-directional: server to client. The client can send new HTTP requests, but the SSE channel only flows one way. That means the server couldn’t request information from the client mid-operation. If a tool needed additional context—“which branch do you want me to deploy to?”—the server had to either fail the request or guess.
SSE also doesn’t play well with existing infrastructure. CORS policies, authentication middleware, reverse proxies—all of these assume standard HTTP request-response patterns. SSE connections are long-lived, which means proxy timeouts, load balancer connection limits, and auth token refresh all become complications. Anyone who’s deployed SSE behind a corporate proxy knows the pain.
Then there’s the connection management overhead. SSE requires maintaining a persistent connection alongside regular HTTP requests. Two channels for one protocol. Reconnection logic, heartbeat management, dealing with dropped connections—it’s a lot of machinery for what should be a straightforward tool invocation.
The New Way: Streamable HTTP #
Streamable HTTP collapses everything into standard HTTP methods. POST for sending requests and receiving responses. GET for server-initiated communication. No separate SSE channel. No persistent connections required—though the server can optionally upgrade a response to a stream when needed.
The elegance is in how it handles the streaming case. When a client sends a POST request, the server can respond in one of two ways:
Standard response. A regular HTTP response with the result. Used when the tool returns immediately.
Streaming response. The server holds the connection open and sends incremental updates as the operation progresses. Used for long-running operations like build logs, database queries that return large result sets, or code execution that produces output over time.
The client doesn’t need to know in advance which type of response it’ll get. It sends the same request either way. The server decides based on the nature of the operation.
Here’s the part that fixes the SSE limitation: Streamable HTTP supports true bidirectional communication. During a streaming response, the server can send a request back to the client—asking for additional input, requesting confirmation, or prompting for missing parameters. The client responds on the same connection. No secondary channel needed.
Why This Matters for Developers #
Existing Infrastructure Just Works #
The biggest practical win is compatibility. Streamable HTTP uses standard HTTP methods. Your existing CORS configuration handles it. Your auth middleware—OAuth, API keys, JWT verification—works without modification. Your reverse proxy and load balancer don’t need special SSE configuration. Your CDN can cache appropriate responses.
This matters more than it sounds. I’ve talked to teams that spent weeks debugging SSE connections through their infrastructure—corporate proxies that silently drop SSE connections after 30 seconds, load balancers that don’t understand SSE connection semantics, auth middleware that can’t refresh tokens on a persistent connection. Streamable HTTP sidesteps all of that because it’s just HTTP.
Real-Time Without the Overhead #
Consider the use cases where streaming matters for AI tool use.
Live build logs. An AI agent triggers a build through an MCP tool. Instead of waiting for the build to complete—which could take minutes—the server streams log output as it happens. The AI can monitor progress, detect failures early, and potentially take corrective action before the build finishes.
Incremental database results. A query returns 10,000 rows. Rather than buffering the entire result set before responding, the server streams rows as they’re fetched. The AI can start analyzing early results while the query continues. If it finds what it needs in the first 500 rows, it can cancel the rest.
Code execution output. An AI agent runs a script through an MCP tool. The script produces output over time—print statements, progress indicators, intermediate results. Streaming lets the AI observe execution in real time rather than waiting for the script to finish.
All of these worked with SSE, technically. But they required maintaining a separate SSE channel, handling reconnection, and dealing with the infrastructure compatibility issues I described. Streamable HTTP makes them straightforward.
Bidirectional Communication Unlocks New Patterns #
This is the capability that SSE fundamentally couldn’t provide. With Streamable HTTP, servers can initiate requests to clients during an ongoing operation.
Why does this matter? Consider an MCP server that provides deployment capabilities. The AI agent calls a “deploy” tool. Midway through, the server discovers a configuration conflict and needs the AI to make a decision: override the existing config or abort? With SSE, the server would have to fail the operation and let the client retry with additional parameters. With Streamable HTTP, the server sends a request back to the client asking for the decision, gets the response, and continues the deployment. One roundtrip instead of a full retry cycle.
Or consider a database tool that encounters an ambiguous query. “Did you mean the ‘users’ table in the production schema or the staging schema?” The server can ask, get the answer, and proceed—all within the same connection.
This pattern—server-initiated requests during tool execution—opens up interactive tool use that wasn’t possible with the one-directional SSE approach.
Migration Path #
If you’re running MCP servers with the old HTTP+SSE transport, the migration story is reasonable.
The MCP TypeScript SDK maintains backward compatibility. Existing SSE-based servers continue to work; clients will negotiate the transport automatically. You don’t have to migrate everything at once. New servers can use Streamable HTTP while old ones stay on SSE until you’re ready to update them.
For new MCP servers, use Streamable HTTP from the start. The TypeScript SDK makes this straightforward:
import { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js";
import { StreamableHTTPServerTransport } from "@modelcontextprotocol/sdk/server/streamableHttp.js";
const server = new McpServer({ name: "my-tool", version: "1.0.0" });
// Define your tools, resources, prompts...
server.tool("query", { sql: z.string() }, async ({ sql }) => {
// Tool implementation
return { content: [{ type: "text", text: result }] };
});
// Use Streamable HTTP transport
const transport = new StreamableHTTPServerTransport({ sessionIdGenerator: () => randomUUID() });
server.connect(transport);The key difference from SSE transport: you instantiate StreamableHTTPServerTransport instead of SSEServerTransport. The server API is identical. Your tool definitions, resource handlers, and prompt templates don’t change.
For Java/Kotlin developers, Spring AI’s MCP Boot Starters support the new transport. Cloudflare’s Agents platform also supports Streamable HTTP for MCP servers deployed at the edge.
What to Watch #
A few things I’m keeping an eye on.
Session management. Streamable HTTP supports optional session IDs for maintaining state across requests. The spec is flexible about whether sessions are required, which means different server implementations handle statefulness differently. If you’re building clients that connect to multiple MCP servers, be prepared for inconsistency here.
Timeout semantics. Streaming responses can be long-lived. How long should a client wait for the next chunk before assuming the connection is dead? The spec doesn’t prescribe specific timeouts, so you’ll need to define your own based on the tools you’re serving. A build log might stream for ten minutes; a database query should respond within seconds.
Security. The spec update doesn’t add new auth mechanisms—it relies on standard HTTP auth (Bearer tokens, API keys, mTLS). For production deployments, you still need to think about who can call your MCP server, what tools they can access, and how you rotate credentials. The good news is that your existing HTTP security tooling applies directly. The less-good news is that MCP doesn’t enforce any particular security posture out of the box.
The Bigger Picture #
MCP’s transport evolution reflects a broader pattern in developer tooling: stop inventing new protocols when existing ones will do. SSE was a clever hack that solved the “server needs to push updates” problem, but it created a bunch of secondary problems around infrastructure compatibility and bidirectionality. Streamable HTTP says: let’s just use HTTP properly.
For AI tool ecosystems, this is a maturity signal. The protocol is moving from “make it work” to “make it work with everything else.” That’s exactly what you want from a standard that aspires to be the universal connector between AI systems and the rest of your stack.