QuilrAI
Back to Resources
Engineering

Dynamic Tool Calling: How We 2× MCP Tool Usage

Instead of loading all 150+ MCP tools upfront, QuilrAI injects only the tools relevant to each request. The result: more calls, lower context bloat.

7 min read
March 2026

A common pattern in MCP deployments is to register all available tools at session start and let the model decide which ones to invoke. This works fine at 10 tools, but at 150+ tools, typical for a production enterprise deployment integrating Salesforce, Jira, Confluence, GitHub, Slack, and a dozen internal APIs, the tool schema list alone can consume 40,000+ tokens per request. The model's attention degrades, tool selection accuracy drops, and per-request cost spikes. Dynamic tool calling solves all three problems.

What Is Context-Aware Tool Injection?

Rather than loading all tools at session start, the gateway analyzes the incoming user message, the current agent task state, and the recent conversation history to predict which tools are likely to be needed for this specific turn. This prediction runs a fast embedding-based relevance score against a pre-indexed tool catalog. Only the top-K tools (typically 8–15) are injected into the model context for this turn, with K dynamically adjusted based on query complexity signals.

How Does Relevance Scoring Work?

Tool relevance scoring combines semantic similarity between the user query and each tool's description, a usage frequency prior weighted by recency, and a task-type classifier that groups tools into functional families (read vs. write, internal vs. external, structured vs. unstructured). The scoring model runs in under 3ms on CPU, adding negligible latency to the request path. When the model invokes a tool that was not in the initial injection set, the gateway intercepts the call, appends the missing tool schema, and re-invokes, preventing tool-not-found errors while maintaining the context efficiency of dynamic loading.

What Are the Results: 2× Tool Usage, Half the Context?

Across a production deployment handling 15,000 daily requests against 160 registered MCP tools, dynamic injection reduced average tool schema token overhead from 42,000 to 9,800 tokens per request (a 77% reduction) while tool invocation frequency doubled, from an average of 1.8 tool calls per multi-step task to 3.7. The model selects tools more accurately when the relevant options are not buried in a 160-item list.

QuilrAI

How QuilrAI addresses this: The QuilrAI MCP Gateway implements dynamic tool injection as a transparent middleware layer. Tool catalog indexing happens at registration time; relevance scoring happens per-request with under 3ms overhead. No changes to the MCP server or the agent application are required.

Related Articles

Dig deeper

Secure your AI stack today

See how QuilrAI's Guardian Agent and LLM Gateway protect your AI deployment from the threats covered in this article.

Get a Demo