Building MCP clients
Managing multiple server connections
Connecting to many MCP servers at once.
Video coming soon
One agent, many servers
Almost any nontrivial agent will end up connecting to more than one MCP server. The filesystem server gives it local files. The GitHub server gives it pull requests. The internal SQL server gives it user data. The vendor's product server gives it domain-specific tools. Five servers, five clients, all running inside the same host.
This works as long as the host knows how to manage them coherently. This lesson covers the patterns: namespacing, parallel connections, partial failure, and routing.
Namespacing tools
Each server names its tools without knowing about the others. Two servers can both expose a search tool. The host has to disambiguate. The standard pattern is to namespace by server name when presenting tools to the agent:
filesystem.search
github.search
sql.query
slack.postThe agent's tool registry contains the namespaced names. When the agent calls github.search(...), the host strips the namespace and routes to the GitHub client.
Some hosts use a separator other than dot (__, :, -). The exact choice doesn't matter as long as it's unambiguous and consistent.
class Host:
def __init__(self, clients):
self.clients = clients # {server_name: client}
self.tools = {} # {namespaced_name: (server_name, tool_name)}
async def initialize(self):
for name, client in self.clients.items():
await client.initialize()
for tool in await client.list_tools():
self.tools[f"{name}.{tool.name}"] = (name, tool.name)
async def call_tool(self, namespaced_name, args):
server, tool = self.tools[namespaced_name]
return await self.clients[server].call_tool(tool, args)This is a small piece of code that solves a real problem. Without it, you have collisions and routing bugs.
Parallel connections
Five servers connecting in series is slow if any of them takes a few seconds to start. Connect them in parallel:
async def connect_all(configs):
clients = {name: make_client(cfg) for name, cfg in configs.items()}
await asyncio.gather(*(c.initialize() for c in clients.values()))
return clientsFor stdio servers, the connect step is "spawn subprocess + handshake," which is fast. For HTTP servers, it's "open TCP + handshake." Either way, doing them in parallel cuts startup time roughly N times.
Tool aggregation in practice
Once initialized, the host aggregates each client's tools into one combined registry. There's a temptation to also aggregate resources and prompts. Don't, until you need to:
- Tools are the most-used primitive and benefit from aggregation. The agent picks from one big tool list.
- Resources have URIs that already encode the server (
github://...,file://...). Aggregation buys little; just dispatch by URI scheme. - Prompts are usually presented as a UI catalog, often grouped by server. No flattening needed.
Tools get aggregated; the others stay scoped per server.
Partial failure
What if one of your five servers fails to start? Three policies:
Fail-closed (strict)
If any required server fails, the host doesn't start. Useful for systems where every server is essential and partial functionality is worse than none.
Fail-open (degraded)
The host starts with however many servers came up. Tools from failed servers aren't in the registry. The agent has reduced capability but works.
Hybrid
Mark each server as "required" or "optional." Required servers are fail-closed; optional servers are fail-open.
async def connect_all(configs):
clients = {}
for name, cfg in configs.items():
try:
client = await connect(cfg)
clients[name] = client
except Exception as e:
if cfg.required:
raise RuntimeError(f"required server {name} failed to start: {e}")
log.warning("optional server %s unavailable: %s", name, e)
return clientsHybrid is the right default for most agents. The user wants to keep working with the filesystem even if the GitHub server is down.
Per-server timeouts and budgets
A slow server shouldn't slow your whole agent. Each tool call should have:
- A timeout per call. Different per server, or per tool. Slow tools (e.g., long DB queries) get more.
- A budget per session. Counter of how many calls a server has taken; if blown, reject further calls or fall back.
async def call_with_budget(host, namespaced_name, args, budget):
server, tool = host.tools[namespaced_name]
if budget.exceeded(server):
return {"status": "denied", "reason": f"server {server} budget exceeded"}
async with timeout(host.timeout_for(server)):
result = await host.call_tool(namespaced_name, args)
budget.consume(server)
return resultThis is the production hardening that keeps a single misbehaving server from taking down the agent.
Hot-reloading server lists
Some hosts let users add or remove MCP servers without restarting the agent. This means:
- Keeping a watch on the config file (or admin API).
- Connecting new servers and disconnecting removed ones.
- Updating the aggregated tool registry.
- Notifying the agent (or invalidating its cached view) so it picks up the new tool list on the next turn.
async def reload_servers(host, new_configs):
added = set(new_configs) - set(host.clients)
removed = set(host.clients) - set(new_configs)
for name in removed:
await host.clients[name].disconnect()
del host.clients[name]
for name in added:
client = await connect(new_configs[name])
host.clients[name] = client
await host.refresh_tools()For interactive agents, hot-reload is a real UX feature. Users discover new MCP servers in a directory, install them, and start using them in the same session.
Per-server policy
Different servers may need different defaults:
| Setting | Filesystem (stdio) | GitHub (HTTPS) | SQL (HTTPS) |
|---|---|---|---|
| Timeout | 5s | 30s | 60s |
| Concurrent calls | 4 | 8 | 2 |
| Budget per session | 1000 | 100 | 50 |
| Required at startup | yes | no | no |
The host should let you configure these per server. Most agent hosts make this part of the same config that lists the servers.
Discoverability for the agent
The agent reasons over the aggregated tool list. Two patterns help it choose well:
Group tools by server in the prompt
You have access to tools from these servers:
[filesystem]
- filesystem.read_file: read a file by path
- filesystem.search: search file contents
[github]
- github.list_prs: list pull requests
- github.create_pr: open a new pull request
[sql]
- sql.query: run a SQL query (read-only)Grouping makes it easier for the model to pick by domain.
Filter tools per request
If the agent is doing a code task, you might exclude SQL tools entirely (cheaper context, cleaner choice). This is the prompt routing pattern from Track 2 Module 1, applied to MCP tools.
More servers is not a feature
A common mistake at this stage is to install every MCP server you can find. The agent now has 80 tools spread across 12 servers. Tool selection accuracy plummets (Track 2 Module 1 covered this) and so does latency to start the agent up. Each server has to earn its slot. If a server provides 6 tools you never use, drop it.
Key takeaway
Multi-server hosts aggregate clients into one tool registry, namespace tools by server, and connect in parallel. Adopt fail-open with optional/required flags so one bad server doesn't kill the agent. Add per-server timeouts and budgets to contain misbehavior. Hot-reload makes interactive agents feel alive. The next lesson covers the dynamic side: tools and capabilities that change after the connection is up.
Done with this lesson?