A common pattern in MCP deployments is to register all available tools at session start and let the model decide which ones to invoke. This works fine at 10 tools, but at 150+ tools, typical for a production enterprise deployment integrating Salesforce, Jira, Confluence, GitHub, Slack, and a dozen internal APIs, the tool schema list alone can consume 40,000+ tokens per request. The model's attention degrades, tool selection accuracy drops, and per-request cost spikes. Dynamic tool calling solves all three problems.
What Is Context-Aware Tool Injection?
Rather than loading all tools at session start, the gateway analyzes the incoming user message, the current agent task state, and the recent conversation history to predict which tools are likely to be needed for this specific turn. This prediction runs a fast embedding-based relevance score against a pre-indexed tool catalog. Only the top-K tools (typically 8–15) are injected into the model context for this turn, with K dynamically adjusted based on query complexity signals.
How Does Relevance Scoring Work?
Tool relevance scoring combines semantic similarity between the user query and each tool's description, a usage frequency prior weighted by recency, and a task-type classifier that groups tools into functional families (read vs. write, internal vs. external, structured vs. unstructured). The scoring model runs in under 3ms on CPU, adding negligible latency to the request path. When the model invokes a tool that was not in the initial injection set, the gateway intercepts the call, appends the missing tool schema, and re-invokes, preventing tool-not-found errors while maintaining the context efficiency of dynamic loading.
What Are the Results: 2× Tool Usage, Half the Context?
Across a production deployment handling 15,000 daily requests against 160 registered MCP tools, dynamic injection reduced average tool schema token overhead from 42,000 to 9,800 tokens per request (a 77% reduction) while tool invocation frequency doubled, from an average of 1.8 tool calls per multi-step task to 3.7. The model selects tools more accurately when the relevant options are not buried in a 160-item list.
- 150+ MCP tools in context consumes 40,000+ tokens and degrades model attention
- Dynamic injection reduces tool context to 8–15 relevant tools per turn
- Relevance scoring combines semantic similarity, usage prior, and task-type classification
- 77% reduction in tool schema token overhead in production deployments
- 2× increase in tool invocation frequency with no prompt changes
QuilrAI
How QuilrAI addresses this: The QuilrAI MCP Gateway implements dynamic tool injection as a transparent middleware layer. Tool catalog indexing happens at registration time; relevance scoring happens per-request with under 3ms overhead. No changes to the MCP server or the agent application are required.