How to Deploy MCPs: Complete Guide
Deploy MCP servers in production using self-hosting, AWS Lambda, Cloud Run, Railway, or MCP-native platforms. Includes code examples, Dockerfiles, security best practices, and a comparison of all approaches.

Quick answer: Deploy an MCP server by choosing one of five approaches: self-host on your own infrastructure for maximum control, use serverless platforms like AWS Lambda or Google Cloud Run for scale-to-zero pricing, deploy to Railway/Render for simplicity, or use an MCP-native platform like agnexus for the fastest setup. Each has trade-offs between setup time, maintenance burden, and cost—this guide covers all options with real code examples.
The Model Context Protocol (MCP) is becoming the standard way for AI agents to connect to external tools and data. Anthropic released it in late 2024, OpenAI adopted it in early 2025, and it's now supported across Claude, ChatGPT, Cursor, and dozens of other clients.
Building an MCP server is straightforward—most are just Python or TypeScript services with a few hundred lines of code. The challenge is getting them running reliably in production where your agents can actually use them. This guide covers every practical option.
How Does MCP Deployment Work?
Before diving into deployment options, it helps to understand what you're deploying. An MCP server exposes:
- Tools — Functions your agent can call (
create_github_issue,send_slack_message) - Resources — Data your agent can read (files, database records, API responses)
- Prompts — Reusable templates for common tasks (optional)
What Transport Should I Use?
MCP servers communicate using different transport mechanisms. Your choice affects deployment:
stdio (Standard I/O)
The original transport. The MCP client spawns your server as a subprocess and communicates via stdin/stdout. This is what Claude Desktop uses when running MCPs locally. For remote deployment, this doesn't work since there's no subprocess to spawn.
HTTP + SSE (Server-Sent Events)
The remote-friendly transport. Your MCP runs as an HTTP service, and clients connect over the network. SSE handles the streaming responses MCP needs. This is what you'll use for deployed MCPs.
Framework Support
What Are My Deployment Options?
There's no single right answer. The best approach depends on your constraints: how much control you need, your team's expertise, your budget, and how quickly you want to move.
| Approach | Setup Time | Maintenance | Best For |
|---|---|---|---|
| Self-hosted (VPS/K8s) | Hours to days | High | Full control, compliance |
| AWS Lambda | 1-2 hours | Medium | Sporadic usage, AWS ecosystem |
| Google Cloud Run | 30 min - 1 hour | Low-Medium | Containers, scale to zero |
| Railway / Render | 10-30 min | Low | Quick deploys, simple apps |
| MCP-native (agnexus) | 5-10 min | Minimal | MCP-specific, fast iteration |
Option 1: Self-Hosted Infrastructure
Running your MCP server on infrastructure you control—a VPS, EC2 instance, or Kubernetes cluster. Maximum flexibility but requires the most work.
When to Choose Self-Hosted
- You have strict compliance requirements (data residency, air-gapped networks)
- You're integrating with internal systems not accessible from public internet
- You already have a mature DevOps setup and the incremental cost is low
- You need complete control over the runtime environment
Basic VPS Setup
The simplest self-hosted approach is a single VM running Docker:
version: '3.8'
services:
mcp-server:
build: .
ports:
- "8080:8080"
environment:
- API_KEY=${API_KEY}
restart: unless-stopped
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8080/health"]
interval: 30s
timeout: 10s
retries: 3You'll also need to handle:
- TLS termination — Caddy, nginx, or Traefik as reverse proxy with automatic HTTPS
- Process management — Ensure server restarts on crashes (Docker restart policy helps)
- Monitoring — Prometheus + Grafana, Datadog, or similar
- Log aggregation — Centralized logging for debugging
- Updates — A deployment pipeline for new versions
Kubernetes for Scale
For production workloads requiring auto-scaling and self-healing:
apiVersion: apps/v1
kind: Deployment
metadata:
name: mcp-server
spec:
replicas: 2
selector:
matchLabels:
app: mcp-server
template:
metadata:
labels:
app: mcp-server
spec:
containers:
- name: mcp-server
image: your-registry/mcp-server:latest
ports:
- containerPort: 8080
env:
- name: API_KEY
valueFrom:
secretKeyRef:
name: mcp-secrets
key: api-key
resources:
requests:
memory: "256Mi"
cpu: "250m"
limits:
memory: "512Mi"
cpu: "500m"
livenessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 10
periodSeconds: 10Self-Hosted Cost Reality
Option 2: AWS Lambda
Lambda is attractive for MCPs with sporadic usage—you only pay when your agent actually calls the MCP. But there are important constraints.
Cold Start Latency
Lambda functions that haven't been invoked recently need to "warm up." For Python MCPs, expect 1-3 seconds on first request. Node.js is faster, usually under a second. This delay happens after periods of inactivity (typically 5-15 minutes).
Mitigation strategies:
- Provisioned concurrency — Pre-warm instances (adds cost)
- Keep-alive pings — CloudWatch event that invokes the function periodically
- Optimize package size — Smaller deployments warm up faster
Timeout Limits
Lambda has a hard 15-minute limit per invocation. Most MCP tool calls complete in seconds, but if your MCP processes large datasets or makes many sequential API calls, you might hit this.
AWSTemplateFormatVersion: '2010-09-09'
Transform: AWS::Serverless-2016-10-31
Resources:
McpFunction:
Type: AWS::Serverless::Function
Properties:
Handler: server.handler
Runtime: python3.11
Timeout: 30
MemorySize: 512
Environment:
Variables:
API_KEY: !Ref ApiKeyParameter
Events:
McpApi:
Type: HttpApi
Properties:
Path: /{proxy+}
Method: ANYSSE on Lambda
Option 3: Google Cloud Run
Cloud Run sits between Lambda and full container orchestration. Deploy a Docker container, Google handles scaling, load balancing, and HTTPS.
Why Cloud Run Works Well for MCPs
- No timeout limits — Requests can run up to 60 minutes
- Native SSE support — Streaming responses work out of the box
- Fast cold starts — Typically under 2 seconds for optimized containers
- Scale to zero — No cost when idle, or set minimum instances
FROM python:3.11-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
ENV PORT=8080
EXPOSE 8080
CMD ["python", "server.py"]gcloud run deploy mcp-server \
--source . \
--region us-central1 \
--allow-unauthenticated \
--set-env-vars "API_KEY=$API_KEY"Option 4: Railway, Render, Fly.io
These platforms abstract away most infrastructure concerns. Connect a GitHub repo and they handle building, deploying, and running your container.
{
"build": {
"builder": "DOCKERFILE"
},
"deploy": {
"startCommand": "python server.py",
"healthcheckPath": "/health",
"healthcheckTimeout": 30
}
}The trade-off is less control over the runtime environment and networking. These work well for straightforward MCPs but can be limiting for complex requirements.
Option 5: MCP-Native Platforms (agnexus)
Platforms built specifically for MCP deployment understand MCP semantics and optimize for the specific requirements of MCP servers.
What MCP-Native Means
- Transport handling — Automatic HTTP+SSE configuration without manual setup
- MCP-aware analytics — See which tools are being called, not just generic HTTP metrics
- Marketplace discovery — Deploy pre-built MCPs or publish your own
- GitHub integration — Connect a repo, deploy automatically on push
How Do I Deploy on agnexus?
If you want to skip infrastructure setup entirely, here's how to deploy an MCP on agnexus in about 5 minutes.
Deploy a Pre-Built MCP from the Marketplace
The fastest path if you need a common integration (Notion, GitHub, Slack, etc.):
- 1Create an account — Sign up at agnexus.ai (free tier available)
- 2Browse the marketplace — Go to Marketplace and find the MCP you need
- 3Click Deploy — Add your API credentials as environment variables
- 4Copy the MCP URL — You'll get a URL like
your-mcp.agnexus.ai - 5Add to your AI client — Configure Claude Desktop, Cursor, or ChatGPT to use the URL
Deploy Your Own Custom MCP
If you're building something custom:
- 1Prepare your code — Make sure your MCP uses HTTP+SSE transport and listens on port 8080
- 2Push to GitHub — agnexus deploys from your repository
- 3Connect your repo — In the dashboard, click "Deploy from GitHub" and authorize access
- 4Add environment variables — Configure API keys and secrets
- 5Deploy — agnexus builds and deploys your container automatically
After deployment, you get a dedicated subdomain and can enable auto-deploy on every push to your main branch.
agnexus Pricing
How Do I Build an MCP Server?
Regardless of where you deploy, you need an MCP server. The two main frameworks are FastMCP (Python) and the official TypeScript SDK.
FastMCP (Python)
The quickest way to build an MCP server in Python. Handles protocol details so you focus on your tools.
from fastmcp import FastMCP
import httpx
import os
mcp = FastMCP("GitHub MCP")
@mcp.tool()
async def create_issue(repo: str, title: str, body: str) -> dict:
"""Create a GitHub issue in the specified repository."""
token = os.environ["GITHUB_TOKEN"]
async with httpx.AsyncClient() as client:
response = await client.post(
f"https://api.github.com/repos/{repo}/issues",
headers={
"Authorization": f"token {token}",
"Accept": "application/vnd.github.v3+json"
},
json={"title": title, "body": body}
)
response.raise_for_status()
return response.json()
@mcp.tool()
async def list_issues(repo: str, state: str = "open") -> list:
"""List issues from a GitHub repository."""
token = os.environ["GITHUB_TOKEN"]
async with httpx.AsyncClient() as client:
response = await client.get(
f"https://api.github.com/repos/{repo}/issues",
headers={
"Authorization": f"token {token}",
"Accept": "application/vnd.github.v3+json"
},
params={"state": state}
)
response.raise_for_status()
return response.json()
if __name__ == "__main__":
mcp.run(transport="sse", port=8080)fastmcp>=0.1.0
httpx>=0.25.0Dockerizing Your MCP
Most deployment platforms expect a Docker container. Here's a production-ready Dockerfile:
FROM python:3.11-slim as builder
WORKDIR /app
RUN python -m venv /opt/venv
ENV PATH="/opt/venv/bin:$PATH"
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
FROM python:3.11-slim
RUN useradd --create-home --shell /bin/bash app
USER app
WORKDIR /app
COPY --from=builder /opt/venv /opt/venv
ENV PATH="/opt/venv/bin:$PATH"
COPY --chown=app:app . .
HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
CMD curl -f http://localhost:8080/health || exit 1
EXPOSE 8080
CMD ["python", "server.py"]How Do I Secure My MCP?
MCP servers often handle sensitive credentials. Security matters.
Secret Management
- Never commit secrets — Use environment variables or secret managers
- Rotate regularly — Set reminders to rotate API keys quarterly
- Least privilege — If you only need read access, don't use a write token
Authentication
Your MCP endpoint shouldn't be open to the world:
- API keys — Simple but effective, pass in headers
- OAuth — For multi-tenant or user-specific access
- Network-level — Private VPCs, IP allowlists, VPN
Platform-Level Access Keys
Rate Limiting
Your MCP calls external APIs. Those APIs have rate limits. Protect yourself:
from tenacity import retry, stop_after_attempt, wait_exponential
@retry(
stop=stop_after_attempt(3),
wait=wait_exponential(multiplier=1, min=2, max=10)
)
async def call_external_api(url: str, **kwargs):
async with httpx.AsyncClient() as client:
response = await client.get(url, **kwargs)
if response.status_code == 429:
retry_after = int(response.headers.get("Retry-After", 5))
await asyncio.sleep(retry_after)
raise Exception("Rate limited, retrying...")
response.raise_for_status()
return response.json()How Do I Connect My Agent?
Once deployed, configure your AI client to use the MCP.
Claude Desktop
Edit the config file at ~/Library/Application Support/Claude/claude_desktop_config.json (macOS):
{
"mcpServers": {
"github": {
"url": "mcp+https://your-mcp.example.com",
"headers": {
"Authorization": "Bearer your-api-key"
}
}
}
}Custom Agents
Use the MCP client SDK:
from mcp import ClientSession
from mcp.client.sse import sse_client
async def main():
async with sse_client("https://your-mcp.example.com") as client:
async with ClientSession(*client) as session:
await session.initialize()
# List available tools
tools = await session.list_tools()
print(f"Available: {[t.name for t in tools.tools]}")
# Call a tool
result = await session.call_tool(
"create_issue",
{"repo": "owner/repo", "title": "Test issue"}
)
print(result)Which Approach Should I Choose?
Decision framework:
- Just exploring? → Railway, Render, or agnexus marketplace
- Need it running today? → Cloud Run or agnexus
- Sporadic usage, cost-sensitive? → Lambda (if you can handle SSE complexity)
- Already have Kubernetes? → Self-hosted, it's just another service
- Strict compliance? → Self-hosted with your security controls
- Building multiple MCPs? → agnexus or similar MCP-native platform
Ready to deploy?
Get your MCP running in production. Start with the marketplace or deploy your own.