Back to Guides
GuideBy Pedro Fonseca·12 min read·

How to Deploy MCPs: Complete Guide

Deploy MCP servers in production using self-hosting, AWS Lambda, Cloud Run, Railway, or MCP-native platforms. Includes code examples, Dockerfiles, security best practices, and a comparison of all approaches.

How to Deploy MCPs: Complete Guide

Quick answer: Deploy an MCP server by choosing one of five approaches: self-host on your own infrastructure for maximum control, use serverless platforms like AWS Lambda or Google Cloud Run for scale-to-zero pricing, deploy to Railway/Render for simplicity, or use an MCP-native platform like agnexus for the fastest setup. Each has trade-offs between setup time, maintenance burden, and cost—this guide covers all options with real code examples.

The Model Context Protocol (MCP) is becoming the standard way for AI agents to connect to external tools and data. Anthropic released it in late 2024, OpenAI adopted it in early 2025, and it's now supported across Claude, ChatGPT, Cursor, and dozens of other clients.

Building an MCP server is straightforward—most are just Python or TypeScript services with a few hundred lines of code. The challenge is getting them running reliably in production where your agents can actually use them. This guide covers every practical option.

How Does MCP Deployment Work?

Before diving into deployment options, it helps to understand what you're deploying. An MCP server exposes:

  • Tools — Functions your agent can call (create_github_issue, send_slack_message)
  • Resources — Data your agent can read (files, database records, API responses)
  • Prompts — Reusable templates for common tasks (optional)

What Transport Should I Use?

MCP servers communicate using different transport mechanisms. Your choice affects deployment:

stdio (Standard I/O)

The original transport. The MCP client spawns your server as a subprocess and communicates via stdin/stdout. This is what Claude Desktop uses when running MCPs locally. For remote deployment, this doesn't work since there's no subprocess to spawn.

HTTP + SSE (Server-Sent Events)

The remote-friendly transport. Your MCP runs as an HTTP service, and clients connect over the network. SSE handles the streaming responses MCP needs. This is what you'll use for deployed MCPs.

Framework Support

Most MCP frameworks (FastMCP, mcp-python, mcp-typescript) support both transports. When deploying remotely, you configure the server to use HTTP+SSE. Some frameworks detect this automatically based on how the server starts.

What Are My Deployment Options?

There's no single right answer. The best approach depends on your constraints: how much control you need, your team's expertise, your budget, and how quickly you want to move.

ApproachSetup TimeMaintenanceBest For
Self-hosted (VPS/K8s)Hours to daysHighFull control, compliance
AWS Lambda1-2 hoursMediumSporadic usage, AWS ecosystem
Google Cloud Run30 min - 1 hourLow-MediumContainers, scale to zero
Railway / Render10-30 minLowQuick deploys, simple apps
MCP-native (agnexus)5-10 minMinimalMCP-specific, fast iteration

Option 1: Self-Hosted Infrastructure

Running your MCP server on infrastructure you control—a VPS, EC2 instance, or Kubernetes cluster. Maximum flexibility but requires the most work.

When to Choose Self-Hosted

  • You have strict compliance requirements (data residency, air-gapped networks)
  • You're integrating with internal systems not accessible from public internet
  • You already have a mature DevOps setup and the incremental cost is low
  • You need complete control over the runtime environment

Basic VPS Setup

The simplest self-hosted approach is a single VM running Docker:

docker-compose.yml
version: '3.8'
services:
  mcp-server:
    build: .
    ports:
      - "8080:8080"
    environment:
      - API_KEY=${API_KEY}
    restart: unless-stopped
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8080/health"]
      interval: 30s
      timeout: 10s
      retries: 3

You'll also need to handle:

  • TLS termination — Caddy, nginx, or Traefik as reverse proxy with automatic HTTPS
  • Process management — Ensure server restarts on crashes (Docker restart policy helps)
  • Monitoring — Prometheus + Grafana, Datadog, or similar
  • Log aggregation — Centralized logging for debugging
  • Updates — A deployment pipeline for new versions

Kubernetes for Scale

For production workloads requiring auto-scaling and self-healing:

mcp-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: mcp-server
spec:
  replicas: 2
  selector:
    matchLabels:
      app: mcp-server
  template:
    metadata:
      labels:
        app: mcp-server
    spec:
      containers:
      - name: mcp-server
        image: your-registry/mcp-server:latest
        ports:
        - containerPort: 8080
        env:
        - name: API_KEY
          valueFrom:
            secretKeyRef:
              name: mcp-secrets
              key: api-key
        resources:
          requests:
            memory: "256Mi"
            cpu: "250m"
          limits:
            memory: "512Mi"
            cpu: "500m"
        livenessProbe:
          httpGet:
            path: /health
            port: 8080
          initialDelaySeconds: 10
          periodSeconds: 10

Self-Hosted Cost Reality

Factor in engineer time, not just infrastructure costs. A simple VPS might be $20/month, but if setup takes 10 hours and maintenance takes 5 hours/month at $100/hour, your true cost is $1,500 in the first month. This math changes if you already have the expertise and infrastructure in place.

Option 2: AWS Lambda

Lambda is attractive for MCPs with sporadic usage—you only pay when your agent actually calls the MCP. But there are important constraints.

Cold Start Latency

Lambda functions that haven't been invoked recently need to "warm up." For Python MCPs, expect 1-3 seconds on first request. Node.js is faster, usually under a second. This delay happens after periods of inactivity (typically 5-15 minutes).

Mitigation strategies:

  • Provisioned concurrency — Pre-warm instances (adds cost)
  • Keep-alive pings — CloudWatch event that invokes the function periodically
  • Optimize package size — Smaller deployments warm up faster

Timeout Limits

Lambda has a hard 15-minute limit per invocation. Most MCP tool calls complete in seconds, but if your MCP processes large datasets or makes many sequential API calls, you might hit this.

template.yaml (AWS SAM)
AWSTemplateFormatVersion: '2010-09-09'
Transform: AWS::Serverless-2016-10-31

Resources:
  McpFunction:
    Type: AWS::Serverless::Function
    Properties:
      Handler: server.handler
      Runtime: python3.11
      Timeout: 30
      MemorySize: 512
      Environment:
        Variables:
          API_KEY: !Ref ApiKeyParameter
      Events:
        McpApi:
          Type: HttpApi
          Properties:
            Path: /{proxy+}
            Method: ANY

SSE on Lambda

Lambda's response streaming (required for SSE) needs specific configuration. You'll need to use response streaming mode and configure your API Gateway or Function URL correctly. This adds complexity compared to a traditional server.

Option 3: Google Cloud Run

Cloud Run sits between Lambda and full container orchestration. Deploy a Docker container, Google handles scaling, load balancing, and HTTPS.

Why Cloud Run Works Well for MCPs

  • No timeout limits — Requests can run up to 60 minutes
  • Native SSE support — Streaming responses work out of the box
  • Fast cold starts — Typically under 2 seconds for optimized containers
  • Scale to zero — No cost when idle, or set minimum instances
Dockerfile
FROM python:3.11-slim

WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY . .

ENV PORT=8080
EXPOSE 8080

CMD ["python", "server.py"]
Deploy command
gcloud run deploy mcp-server \
  --source . \
  --region us-central1 \
  --allow-unauthenticated \
  --set-env-vars "API_KEY=$API_KEY"

Option 4: Railway, Render, Fly.io

These platforms abstract away most infrastructure concerns. Connect a GitHub repo and they handle building, deploying, and running your container.

railway.json
{
  "build": {
    "builder": "DOCKERFILE"
  },
  "deploy": {
    "startCommand": "python server.py",
    "healthcheckPath": "/health",
    "healthcheckTimeout": 30
  }
}

The trade-off is less control over the runtime environment and networking. These work well for straightforward MCPs but can be limiting for complex requirements.

Option 5: MCP-Native Platforms (agnexus)

Platforms built specifically for MCP deployment understand MCP semantics and optimize for the specific requirements of MCP servers.

What MCP-Native Means

  • Transport handling — Automatic HTTP+SSE configuration without manual setup
  • MCP-aware analytics — See which tools are being called, not just generic HTTP metrics
  • Marketplace discovery — Deploy pre-built MCPs or publish your own
  • GitHub integration — Connect a repo, deploy automatically on push

How Do I Deploy on agnexus?

If you want to skip infrastructure setup entirely, here's how to deploy an MCP on agnexus in about 5 minutes.

Deploy a Pre-Built MCP from the Marketplace

The fastest path if you need a common integration (Notion, GitHub, Slack, etc.):

  1. 1Create an account — Sign up at agnexus.ai (free tier available)
  2. 2Browse the marketplace — Go to Marketplace and find the MCP you need
  3. 3Click Deploy — Add your API credentials as environment variables
  4. 4Copy the MCP URL — You'll get a URL like your-mcp.agnexus.ai
  5. 5Add to your AI client — Configure Claude Desktop, Cursor, or ChatGPT to use the URL

Deploy Your Own Custom MCP

If you're building something custom:

  1. 1Prepare your code — Make sure your MCP uses HTTP+SSE transport and listens on port 8080
  2. 2Push to GitHub — agnexus deploys from your repository
  3. 3Connect your repo — In the dashboard, click "Deploy from GitHub" and authorize access
  4. 4Add environment variables — Configure API keys and secrets
  5. 5Deploy — agnexus builds and deploys your container automatically

After deployment, you get a dedicated subdomain and can enable auto-deploy on every push to your main branch.

agnexus Pricing

Free tier: 1 MCP, 3,000 credits. Starter (€29/month): 3 MCPs, 300k credits. Growth (€119/month): 15 MCPs, 3M credits. See pricing page for full details.

How Do I Build an MCP Server?

Regardless of where you deploy, you need an MCP server. The two main frameworks are FastMCP (Python) and the official TypeScript SDK.

FastMCP (Python)

The quickest way to build an MCP server in Python. Handles protocol details so you focus on your tools.

server.py
from fastmcp import FastMCP
import httpx
import os

mcp = FastMCP("GitHub MCP")

@mcp.tool()
async def create_issue(repo: str, title: str, body: str) -> dict:
    """Create a GitHub issue in the specified repository."""
    token = os.environ["GITHUB_TOKEN"]
    
    async with httpx.AsyncClient() as client:
        response = await client.post(
            f"https://api.github.com/repos/{repo}/issues",
            headers={
                "Authorization": f"token {token}",
                "Accept": "application/vnd.github.v3+json"
            },
            json={"title": title, "body": body}
        )
        response.raise_for_status()
        return response.json()

@mcp.tool()
async def list_issues(repo: str, state: str = "open") -> list:
    """List issues from a GitHub repository."""
    token = os.environ["GITHUB_TOKEN"]
    
    async with httpx.AsyncClient() as client:
        response = await client.get(
            f"https://api.github.com/repos/{repo}/issues",
            headers={
                "Authorization": f"token {token}",
                "Accept": "application/vnd.github.v3+json"
            },
            params={"state": state}
        )
        response.raise_for_status()
        return response.json()

if __name__ == "__main__":
    mcp.run(transport="sse", port=8080)
requirements.txt
fastmcp>=0.1.0
httpx>=0.25.0

Dockerizing Your MCP

Most deployment platforms expect a Docker container. Here's a production-ready Dockerfile:

Dockerfile
FROM python:3.11-slim as builder

WORKDIR /app

RUN python -m venv /opt/venv
ENV PATH="/opt/venv/bin:$PATH"

COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

FROM python:3.11-slim

RUN useradd --create-home --shell /bin/bash app
USER app

WORKDIR /app

COPY --from=builder /opt/venv /opt/venv
ENV PATH="/opt/venv/bin:$PATH"

COPY --chown=app:app . .

HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
  CMD curl -f http://localhost:8080/health || exit 1

EXPOSE 8080

CMD ["python", "server.py"]

How Do I Secure My MCP?

MCP servers often handle sensitive credentials. Security matters.

Secret Management

  • Never commit secrets — Use environment variables or secret managers
  • Rotate regularly — Set reminders to rotate API keys quarterly
  • Least privilege — If you only need read access, don't use a write token

Authentication

Your MCP endpoint shouldn't be open to the world:

  • API keys — Simple but effective, pass in headers
  • OAuth — For multi-tenant or user-specific access
  • Network-level — Private VPCs, IP allowlists, VPN

Platform-Level Access Keys

On agnexus, authentication is handled at the platform level through Access Keys. You can generate keys in the dashboard and require them for all requests to your MCP—no need to implement authentication logic in your MCP code. This is available on Starter and Growth plans.

Rate Limiting

Your MCP calls external APIs. Those APIs have rate limits. Protect yourself:

Rate limiting with backoff
from tenacity import retry, stop_after_attempt, wait_exponential

@retry(
    stop=stop_after_attempt(3),
    wait=wait_exponential(multiplier=1, min=2, max=10)
)
async def call_external_api(url: str, **kwargs):
    async with httpx.AsyncClient() as client:
        response = await client.get(url, **kwargs)
        if response.status_code == 429:
            retry_after = int(response.headers.get("Retry-After", 5))
            await asyncio.sleep(retry_after)
            raise Exception("Rate limited, retrying...")
        response.raise_for_status()
        return response.json()

How Do I Connect My Agent?

Once deployed, configure your AI client to use the MCP.

Claude Desktop

Edit the config file at ~/Library/Application Support/Claude/claude_desktop_config.json (macOS):

claude_desktop_config.json
{
  "mcpServers": {
    "github": {
      "url": "mcp+https://your-mcp.example.com",
      "headers": {
        "Authorization": "Bearer your-api-key"
      }
    }
  }
}

Custom Agents

Use the MCP client SDK:

Python client
from mcp import ClientSession
from mcp.client.sse import sse_client

async def main():
    async with sse_client("https://your-mcp.example.com") as client:
        async with ClientSession(*client) as session:
            await session.initialize()
            
            # List available tools
            tools = await session.list_tools()
            print(f"Available: {[t.name for t in tools.tools]}")
            
            # Call a tool
            result = await session.call_tool(
                "create_issue",
                {"repo": "owner/repo", "title": "Test issue"}
            )
            print(result)

Which Approach Should I Choose?

Decision framework:

  • Just exploring? → Railway, Render, or agnexus marketplace
  • Need it running today? → Cloud Run or agnexus
  • Sporadic usage, cost-sensitive? → Lambda (if you can handle SSE complexity)
  • Already have Kubernetes? → Self-hosted, it's just another service
  • Strict compliance? → Self-hosted with your security controls
  • Building multiple MCPs? → agnexus or similar MCP-native platform

Ready to deploy?

Get your MCP running in production. Start with the marketplace or deploy your own.

Frequently Asked Questions

Do I need a Dockerfile for my MCP?

For most deployment platforms (Cloud Run, Railway, agnexus), yes. Some platforms can auto-detect Python/Node.js projects, but a Dockerfile gives you more control over the build process and dependencies.

Can I deploy MCPs that connect to internal systems?

Yes. For self-hosted deployments, you have full network access. For managed platforms, you'll need to expose the internal service or use a VPN/tunnel. agnexus MCPs can be configured as private so only you can access them.

What happens if my MCP crashes?

On managed platforms (Cloud Run, agnexus), the service automatically restarts. For self-hosted, configure your process manager (Docker restart policy, systemd, K8s probes) to handle restarts.

How do I debug MCP connection issues?

Common issues: wrong transport configuration (stdio vs HTTP), firewall blocking connections, incorrect URL format in client config, SSL certificate problems. Check your server logs first—most MCP frameworks log connection attempts.

What's the difference between MCP and a regular API?

MCP is a standardized protocol specifically for AI agent tool use. Unlike custom APIs, MCP servers are automatically discoverable by compatible clients—your agent can see what tools are available without you hardcoding them.

Can I use the same MCP with Claude and ChatGPT?

Yes. If your MCP uses HTTP+SSE transport, it works with any MCP-compatible client. The protocol is the same across platforms.