March 12, 2026·6 min read

Your MCP Setup Is Burning 362,000 Tokens Per Session — Here's What That's Actually Costing You

MCP tool bloat is the fastest-growing hidden cost in AI development. We break down the real numbers — dollars, watts, and water — behind the tokens you're wasting before your agent writes a single line of code.

MCPToken OptimizationAI CostsDeveloper Tools

The protocol that connects your AI agent to everything is silently draining your budget, your context window, and the planet.

Every tool you connect to your AI agent makes it more capable. It also makes it more expensive — in ways you're probably not tracking. MCP (Model Context Protocol) has become the standard for wiring agents to external services, with thousands of community-built servers and adoption across Claude, Cursor, Codex, and more. But as developers connect 10, 30, or 100+ tools, a quiet crisis is building: massive token waste that inflates API bills, degrades output quality, and compounds into a real environmental footprint.

The Problem: Your Agent Is Reading the Manual on Every Turn

Here's how MCP works by default: every tool schema — its name, description, parameters, types — gets injected into your LLM's context window on every single turn. Whether the model uses that tool or not.

A single GitHub MCP server exposes 93 tool definitions. That's roughly 55,000 tokens loaded before you even ask a question. Add Notion, Slack, a database, and a docs server, and you're looking at 75,000+ tokens consumed at conversation start — a third of Claude Sonnet's 200K context window, gone.

One developer measured their setup and found 66,000 tokens eaten on the first message. Not by their code. Not by their prompt. By tool descriptions sitting idle in the context window.

This isn't a theoretical problem. Cursor already enforces a hard limit of 40 tools because they found that more causes visible quality degradation. After 50+ tools, models start chasing tangents, referencing tools instead of answering your actual question. Your AI gets dumber the more tools you give it.

The Numbers: 6 Ways MCP Bloat Is Costing You

I. The Per-Session Token Tax

With 30 connected tools, approximately 3,600 tokens are burned per turn just on schema injection. Scale that to 120 tools over a 25-turn conversation, and you're looking at 362,000 wasted tokens in a single session. That's tokens consumed doing absolutely nothing — the model never touched those tools.

At Claude Opus 4.6 rates ($5 per million input tokens), that's $1.81 wasted per session. Run 50 agent sessions a day across a small team, and you're bleeding over $2,700 per month on phantom token overhead.

Takeaway: Audit how many MCP tools are loaded per session. If you can't name what each one does off the top of your head, your agent can't either.

II. The Output Quality Penalty

Token waste isn't just a billing problem — it's a performance problem. Context windows are zero-sum: every token spent on an unused tool description is a token unavailable for actual reasoning, code, or your conversation history.

Developers report that AI output quality visibly degrades after 50+ tools are connected. The model starts hallucinating tool references, giving generic answers to specific questions, and "forgetting" context you explicitly provided three messages ago. As one engineering team put it: most of us are now drowning in the context we used to beg for.

Takeaway: More tools ≠ better agents. A focused toolset with 10-15 well-chosen integrations will consistently outperform a bloated setup of 100+.

III. The Team Scaling Multiplier

Solo developers can optimize their own MCP configs. The real cost explosion happens when teams scale. Developer one has 20 servers with custom settings and credentials stored locally. Developer two joins and has a completely different setup. Now multiply that by 5, 10, or 50 engineers.

A DevOps team of 5, each with 75,000 tokens of MCP overhead, running 10 AI conversations per day at Claude Opus rates: that's $375/month in pure overhead — before anyone writes a productive prompt. And that's a conservative estimate using a single model tier.

Takeaway: Standardize MCP configurations across your team. Treat your tool registry like a dependency manifest — review what's included and why.

IV. The Model Pricing Asymmetry

Here's a detail most developers overlook: output tokens cost 3-5x more than input tokens across every major provider. Claude Opus 4.6 charges $5 per million input tokens but $25 per million output tokens. GPT-5.2 charges $1.75 input versus $14 output.

When your context is bloated with tool schemas, the model generates longer outputs — it references more tools, hedges more, and produces verbose responses trying to sort through irrelevant context. A 20% increase in output length from context pollution doesn't cost you 20% more. On output-heavy workloads, it can cost 60-100% more because of the input/output pricing gap.

Takeaway: Track your input-to-output token ratio. If output tokens are consistently 3x+ your input, context bloat may be inflating your responses.

V. The Opportunity Cost of Context

Current frontier models offer context windows between 200K and 2M tokens. That sounds enormous — until you realize a production agent might consume a third of it on tool definitions alone.

With 66,000 tokens burned on MCP overhead, you're effectively running a 134K-token agent, not a 200K-token agent. That's the difference between analyzing a full codebase and analyzing two-thirds of one. Between maintaining a 20-turn conversation with full context and losing the thread at turn 14.

Cloudflare recently solved this for their own 2,500+ API endpoints by reducing their entire MCP surface to just two tools — search() and execute() — consuming roughly 1,000 tokens total. That's a 98% reduction in context overhead while maintaining full API coverage.

Takeaway: Calculate your effective context window: total context minus MCP overhead. If it's less than 70% of the advertised limit, you need lazy loading or dynamic discovery.

VI. The Compounding Cost Over Time

Token waste compounds. As your team grows, as you add more MCP servers, as conversations get longer, and as you migrate to more capable (and expensive) models, the overhead multiplies at every level.

A startup processing 10,000 agent sessions per month with moderate MCP bloat (100K wasted tokens per session) burns through 1 billion unnecessary tokens monthly. At Sonnet 4.6 rates, that's $3,000/month in pure waste. At Opus rates, it's $5,000. And that's before factoring in the quality degradation that causes reruns, retries, and longer conversations — each of which burns more tokens.

Takeaway: Build token monitoring into your stack from day one. The teams that track waste early save exponentially more than those who discover it at scale.

The Bigger Picture: Every Wasted Token Has a Carbon Footprint

This isn't just about your AWS bill. Every token processed requires compute, and every compute cycle draws electricity and water.

Recent research estimates that a single AI query consumes 0.3 to 1 watt-hour of energy. A more intensive reasoning query can draw 33+ watt-hours. The IEA projects that AI data centers may consume ten times more electricity by 2026 than they did in 2023. Cornell researchers found that at current growth rates, AI infrastructure could annually emit 24 to 44 million metric tons of CO₂ by 2030 — equivalent to adding 5 to 10 million cars to U.S. roads.

Water is the other hidden cost. Data centers use millions of gallons daily for cooling. Researchers at UC Riverside estimated that 20-50 AI queries can consume 500 milliliters of water. Google's own transparency report measured a median Gemini text prompt at 0.26 milliliters of water and 0.24 watt-hours of energy.

Now consider this: if 362,000 tokens are wasted per session, and millions of developers are running MCP-heavy workflows daily, the aggregate waste is staggering. Those aren't productive computations advancing science or building products. They're tool descriptions being loaded and ignored, thousands of times per second, across data centers worldwide.

Reducing token waste isn't just cost optimization. It's environmental responsibility. Every token you don't waste is electricity that doesn't get consumed, water that doesn't evaporate from a cooling tower, and CO₂ that doesn't enter the atmosphere.

Start Tracking What You're Actually Spending

The MCP token bloat problem is a microcosm of a larger truth in AI development: most teams have no visibility into where their API dollars actually go. They see a monthly bill. They don't see the 40% that was tool schema overhead, the 15% that was retries from degraded context, or the 10% that was the wrong model handling simple tasks.

That's why we're building Costly. A two-line SDK integration that tracks every API call, flags waste patterns, and shows you exactly where your money is going — and where it's being burned. Because you can't optimize what you can't see.

Your agent doesn't need 100 tools in context. Your budget doesn't need the phantom overhead. And the planet doesn't need the wasted compute.

Start seeing your real costs at getcostly.dev.

Your AI is Costly.
Let's fix that.

One install. 7 waste detectors. Every wasted dollar, found.

Get Started View on GitHub →