Building MCP servers
Server lifecycle and error handling
Capability negotiation and graceful failures.
Video coming soon
What happens before, during, and after
The two previous lessons treated MCP servers as if they spring into existence ready to serve. Real servers have a lifecycle: they start up, negotiate with clients, handle a stream of requests under various conditions, and eventually shut down. Each phase has a few specific things you need to get right or your server will misbehave in subtle ways.
This lesson covers the lifecycle in order: capability negotiation, request handling, error semantics, and graceful shutdown. The patterns are universal across SDKs.
Capability negotiation
When a client connects, the very first round of messages is the initialize request. The client says "here are my supported features and protocol version"; the server responds with the same. Both sides intersect their capabilities and remember what the other one supports.
A typical exchange:
client -> server:
{
"jsonrpc": "2.0",
"id": 1,
"method": "initialize",
"params": {
"protocolVersion": "2024-11-05",
"capabilities": {
"roots": {"listChanged": true},
"sampling": {}
},
"clientInfo": {"name": "claude-code", "version": "1.0.0"}
}
}
server -> client:
{
"id": 1,
"result": {
"protocolVersion": "2024-11-05",
"capabilities": {
"tools": {"listChanged": true},
"resources": {"subscribe": true, "listChanged": true}
},
"serverInfo": {"name": "weather-server", "version": "0.2.0"}
}
}The client now knows the server supports tools and resources (with notifications). The server now knows the client supports roots and sampling.
After the handshake, the client sends notifications/initialized and normal traffic begins. This handshake happens every time a connection is established (which is every host-server connect, not every request).
What the server should declare
Be honest. Declaring capabilities you don't actually support causes confusing failures: clients call methods you can't handle. FastMCP declares only the capabilities corresponding to decorators you've actually used; the explicit-protocol SDKs let you set them manually.
Protocol versions
The MCP protocol version is a string. Servers should accept any version they understand and respond with the highest one supported by both sides. Don't fail outright on an unknown version unless it's truly incompatible; warn and try.
Request handling and IDs
After initialize, the server handles a stream of requests. Each request has an id (correlation ID) that the response must echo. Requests can come in any order; the server should not assume serial processing unless its transport guarantees it.
For stdio servers, requests arrive line-by-line and are usually processed in order. For HTTP servers, multiple in-flight requests can arrive concurrently; the server's framework typically handles this.
async def handle_request(req):
if req["method"] == "tools/call":
result = await dispatch_tool(req["params"])
return {"id": req["id"], "result": result}Echoing the id is what lets the client match responses to requests. Drop or mangle it and the client gets confused.
Errors that aren't crashes
When a tool call fails for a handled reason (bad arguments, downstream API rejected it, permission denied), return a structured error or a structured "this didn't work" result. Don't raise an exception that crashes the server.
The MCP error envelope:
{
"id": 42,
"error": {
"code": -32602,
"message": "Invalid params: 'city' must be a non-empty string",
"data": {"field": "city"}
}
}Standard JSON-RPC error codes (-32700 to -32603) cover protocol-level failures. Application errors should use the data field for context.
For tool-level failures the agent should see and reason about, return a result with an error structure rather than a JSON-RPC error:
@mcp.tool()
def deploy(env: str, sha: str) -> dict:
if env == "production":
return {"status": "denied", "reason": "this server can only deploy to staging"}
...
return {"status": "ok", "deploy_id": "..."}The agent reads the structured response and decides what to do. A JSON-RPC error stops the call; a structured response lets the agent recover.
Crashes and recovery
Things go wrong:
- Unhandled exception in a tool. Wrap your tool dispatch in try/except. Log the exception. Return a JSON-RPC error to the client.
- Tool that hangs forever. Add a timeout to anything that touches the network. A hung tool blocks the server.
- Server process dies. Stdio servers are restarted by the host on respawn. HTTP servers are restarted by the supervisor. Either way, in-flight requests are lost; the client should retry idempotent calls.
async def safe_dispatch(method, params):
try:
async with timeout(30):
return await dispatch(method, params)
except asyncio.TimeoutError:
return {"error": {"code": -32603, "message": "tool timed out"}}
except Exception as e:
log.exception("dispatch failed")
return {"error": {"code": -32603, "message": f"internal error: {e}"}}The two layers (timeout, broad except) keep the server alive even when a single tool misbehaves.
Notifications and listChanged
Some servers tell clients about changes mid-session. The capability is listChanged: when the server's list of tools or resources changes, it sends a notification.
# server's tool list changes (e.g. an integration was reconfigured)
mcp.send_notification("notifications/tools/list_changed", {})Clients that declared listChanged support handle the notification by calling tools/list again. Clients that didn't declare it ignore the notification.
This is useful for servers whose capabilities are dynamic. For static servers, you don't need it; the initial tools/list is enough.
Graceful shutdown
When the server is asked to shut down (signal, host disconnect, etc), the right behavior is:
- Stop accepting new requests.
- Finish in-flight requests up to a deadline (5-30 seconds).
- Clean up resources (close DB connections, flush logs, etc).
- Exit.
For stdio servers, an EOF on stdin is the shutdown signal. The server should drain pending work and exit cleanly. For HTTP servers, the framework's lifecycle hooks (FastAPI's shutdown event, Express's close event) handle this.
A common bug: leaving background tasks running after the server announces shutdown. Make sure your tools don't spawn work that outlives the request.
Versioning the server
Treat the server as a versioned API. The serverInfo.version field is for clients to log and report. Bump it when you change the public surface (tool names, schemas, behavior). For breaking changes, consider keeping both old and new tools live for a deprecation window.
@mcp.tool()
def weather(city: str) -> str:
"""[DEPRECATED] Use weather_v2. Get a fake forecast for the given city."""
return weather_v2_impl(city, units="F")
@mcp.tool()
def weather_v2(city: str, units: str = "F") -> dict:
"""Get a structured forecast for the given city."""
...Old clients keep working; new clients use the new tool. After a sufficient deprecation window, remove the old one.
Don't surprise hosts with permanent state changes mid-session
A subtle MCP rule: a server's capabilities and tool list should be stable within a session unless it sends a listChanged notification. Adding a tool without notifying, or changing a schema mid-session, will break clients that cached the initial listing. Either keep the surface stable or use the notification.
What this builds toward
You can now write a server that handles the full lifecycle: capability declaration, structured errors, graceful failure, and clean shutdown. Together with the next module (clients), you have everything needed to bring up an end-to-end MCP integration.
Module 4 covers what changes when this server runs in production: auth, security, multi-server orchestration, observability. The lifecycle pieces in this lesson are the foundation those production patterns rest on.
Key takeaway
MCP servers have a clear lifecycle: capability negotiation on connect, request handling with correlation IDs, structured error responses for handled failures, JSON-RPC errors for protocol problems, and graceful shutdown on disconnect. Wrap dispatch in try/except plus timeout. Be honest about capabilities. Treat the server's tool surface as a versioned public API. The next module switches to the client side: how to actually connect to and use these servers from your agent.
Done with this lesson?