How do I integrate QuilrAI's LLM Gateway into my application?

Change your base_url from the LLM provider's endpoint (e.g., https://api.openai.com/v1) to QuilrAI's gateway endpoint. No code changes, no SDK installation, no prompt modifications. The gateway is fully OpenAI-compatible and works with any OpenAI client library.

What is QuilrAI's MCP Gateway?

QuilrAI's MCP Gateway is a transparent proxy for Model Context Protocol traffic. It sits between your AI agents and your MCP servers, enforcing tool-level permissions, scanning for injection payloads in tool outputs, and logging all MCP interactions, without requiring any changes to your MCP server implementations.

What is the performance overhead of routing through QuilrAI?

QuilrAI adds sub-50ms latency to LLM API calls and sub-5ms to MCP tool calls. The gateway runs policy enforcement, PII detection, and audit logging in parallel with request forwarding. For most agentic workflows, this overhead is imperceptible.

Platform · For Engineers

The gateway between
your apps and AI.

One base_url. 200+ LLMs. 150+ MCPs. Identity, guardrails, cost controls, and routing — production-ready under sub-40ms.

Read the Docs Get Sandbox

200+

LLMs supported

150+

MCP servers

43%

token savings

<40ms

p99 overhead

Architecture LLM Gateway MCP Gateway Integration Cost & Routing Observability

integration.py

# One line. Same SDK.

from openai import OpenAI

client = OpenAI(

base_url="https://guardrails.quilr.ai/openai_compatible/",

api_key="sk-quilr-..."

)

# Every call now passes through:

# Identity → Rate Limits → Guardrails →

# Intents → Prompts → Token Saving → Routing

/openai_compatible//anthropic_messages//vertex_ai//sdk/v1/check

Identity

JWT/JWKS via Auth0, Okta, Google. Per-user, per-team usage. Domain allowlists.

Guardrails

PII/PHI/PCI detection. Prompt injection. Custom intents. Block, redact, monitor.

Cost

TOON compression cuts inputs 43%. Routing groups split traffic across providers.

Routing

Weighted groups, regional endpoints, automatic failover. One model parameter.

Architecture

One gateway. Two pipelines.

Every LLM call and every MCP tool call routes through QuilrAI — identity, guardrails, cost controls, and routing applied consistently in both directions.

LLM Gateway

guardrails.quilr.ai

7 stages · sequential · both directions

Identity & Auth

JWT/JWKS · Auth0/Okta/Google · per-user usage · domain allowlists

Rate Limits

Req per min/hr/day · token budgets · per-team and per-model

Security Guardrails

PII/PHI/PCI detection · prompt injection · block, redact, monitor

Custom Intents

Train your own classifier · block, monitor, or redact matches

Prompt Store

Versioned prompts · {{variable}} templates · enforce mode

Token Saving

JSON→TOON · HTML/Markdown stripping · responses untouched

Request Routing

Weighted routing groups · group-name-as-model · auto-failover

↩ responses scanned on the return path

OpenAIAnthropicAzure OpenAIAWS BedrockVertex AIvLLM / Custom

MCP Gateway

mcp.quilr.ai

6 stages · multiplexed · 1 URL → 150+ servers

Auth & Identity

Bearer token · OAuth DCR · per-agent identity verification

Agent Access Control

Cursor / Claude / OpenAI auto-detected by User-Agent · per-agent toggles

Dynamic Tool Calling

10K → 200

10–20K tokens → ~200 per call · 2× usage · LLM picks the right tool

Security Guardrails

Same PII / PHI / PCI guardrails applied to every tool call

Web Search Policy

URL filtering via Zscaler ZIA · Prisma Access · FortiGate · Cisco

Auth Mediation

Gateway brokers OAuth tokens · agents never see raw credentials

↻ 1 URL routes all MCP tool calls (multiplexed)

SlackGitHubJiraGoogle DriveSnowflakeAWS+144 more

LLM Gateway · 7-stage pipeline

What each stage does for you.

Each stage solves a real engineering problem. They run sequentially on the request, and again on the response.

01Identity & Auth

Know exactly who's using AI and how much

JWT/JWKS · Auth0/Okta/Google · per-user usage · domain allowlists

02Rate Limits

Prevent runaway costs and noisy-neighbor problems

Req per min/hr/day · token budgets · per-team and per-model

03Security Guardrails

Stop sensitive data leaks and prompt attacks, both ways

PII/PHI/PCI detection · prompt injection · block, redact, monitor

04Custom Intents

Block topics unique to your business

Train your own classifier · block, monitor, or redact matches

05Prompt Store

Update prompts across all apps instantly, no deploys

Versioned prompts · {{variable}} templates · enforce mode

06Token Saving

Cut input costs 43% automatically, no code changes

JSON→TOON · HTML/Markdown stripping · responses untouched

07Request Routing

Split traffic across providers and auto-failover

Weighted routing groups · group-name-as-model · auto-failover

Return Path

Responses are scanned on the way back — same guardrails, both directions.

Red Team Testing validates configs before production

MCP Gateway · 1 URL · 150+ servers

Multiplexed MCP. 2× usage on the same tokens.

One URL, every MCP. Dynamic Tool Calling cuts tool selection context from 10–20K tokens to ~200 — doubling productive usage with no config changes.

Dynamic Tool Calling

Featured

Without QuilrAI, every MCP request sends ALL tool descriptions to the LLM — 10–20K tokens just to pick a tool. Dynamic Tool Calling presents only the relevant tools: ~200 tokens.

2× actual work completed from same token budget

Higher accuracy — LLM picks the right tool without noise

Automatic — zero configuration across all MCPs

Before · all tool descriptions

15,000

tokens per tool call

After · relevant tools only

200

tokens per tool call

2×productive usage

MCP Multiplexing

One URL routes to all 150+ MCPs. No separate URLs per server. Zero agent config changes.

Web Search MCP

Built-in web search with enterprise URL filtering via Zscaler ZIA, FortiGate, Cisco.

150+ MCP Library

Productivity, Dev Tools, Communication, Data, Cloud. One-click install or custom transport URL.

Auth Mediation

Gateway brokers OAuth tokens. Agents never see raw credentials. OAuth→Token, Token→Token, No Auth→OAuth.

Tool Risk Categorization

Read (low) → Write (medium, review first) → Destructive (high, disabled by default). Auto-hidden when off.

Agent Auto-Detection

Cursor, Claude, OpenAI, Gemini auto-detected by User-Agent. Per-agent MCP toggles.

MCP / AI Portal

Self-service portal — end users browse MCPs, connect their accounts via OAuth, start using tools.

Custom MCPs

Bring your own MCP via transport URL. Same identity, guardrails, and audit pipeline applies.

Integration

One config change. Any SDK.

Change base_url. Every OpenAI-compatible SDK works out of the box.

from openai import OpenAI

client = OpenAI(

base_url="https://guardrails.quilr.ai/openai_compatible/",

api_key="sk-quilr-..."

)

/openai_compatible//anthropic_messages//vertex_ai//sdk/v1/check

SDK Mode

/sdk/v1/check

Standalone content scanning without proxy. Validate inputs before LLM, scan responses, analyze uploads. LiteLLM plugin: pre_call · during_call · post_call.

Identity Aware

JWT/JWKS · Auth0 · Okta

Per-user tracking via JWT, X-User-Email header, or static PEM. Domain restrictions. JWT claims enforcement. Identity enforcement mode blocks unauth calls.

Regional Endpoints

Auto · USA · India

guardrails.quilr.ai → guardrails-usa-1 → guardrails-india-1. Auto-failover chain. Sub-40ms overhead in all regions.

Compatible with:OpenAI SDKAnthropic SDKLangChainCrewAIAutoGenLiteLLMCursorClaude CodeVercel AI SDK

Cost & Routing

Compounding cost reduction.

Dynamic Tool Calling (2× usage) + Token Saving (43%) + Smart Routing. Massive cost reduction without changing your code.

Token Saving · 43% average

Input tokens compressed automatically. Responses untouched.

Before · JSON2.4M tokens · $312

{"name": "John", "age": 30, "city": "NYC"}

After · TOON1.4M tokens · $178

name:John|age:30|city:NYC

43%fewer tokens · responses untouched

Routing Group · "production"

Use the group name as your model parameter. Gateway distributes by weight, fails over automatically.

OpenAIgpt-4o

40%

Anthropicclaude-sonnet-4

35%

Azure OpenAIgpt-4o

25%

model="production"

→ gateway routes by weight → auto-failover on outage

Regional USRegional IndiaBedrockVertex AIvLLM

<40ms

Gateway overhead

All 7 stages, p99

99.6%

Uptime SLA

Across all regions

2–5%

Faster responses

Connection pooling

Regional endpoints

Auto · USA · India

Observability

Logs, traces, and cost dashboards — built in.

Every request, every guardrail decision, every tool call — logged, attributed, and replayable. No separate tracing setup.

Per-user analytics

Token usage, cost, latency, and error rates broken down by user, team, model, and route.

Full request logs

Every prompt, response, tool call, and guardrail decision. Searchable. Exportable.

Trace replay

Reproduce any request end-to-end. See which stage triggered. Debug in seconds.

Real-time alerts

Webhook on guardrail block, budget breach, latency spike, or provider outage.

Live · Last 24h

app.quilr.ai/observability

Requests

1.42M

+18% vs prev 24h

Tokens used

847.3M

+12% vs prev 24h

Avg p50

612ms

−4% vs prev 24h

Guardrail blocks

412

+9% vs prev 24h

Security gets a lot for free

Your CISO says yes. Automatically.

Every pipeline stage generates governance, audit trails, and compliance coverage — no separate security review.

See the security view

PII / PHI / PCI / financial — block, redact, anonymize, or monitor

Prompt injection & jailbreak detection on every call

Custom intents — train your own content classifiers

Endpoint Agent — TLS inspection + DLP on macOS & Windows

Claude.ai compliance API — sync orgs, users, chats, DLP

Red Team Testing — validate configs against adversarial prompts

Get started

Start with one team.
Scale to the whole org.

Free sandbox. Works with your existing SDKs. 150+ MCPs ready to go. Live in minutes, not months.

Read the Docs →Talk to Engineering

Free sandbox · Works with your stack · No vendor lock-in

The gateway betweenyour apps and AI.

One gateway. Two pipelines.

What each stage does for you.

Multiplexed MCP. 2× usage on the same tokens.

Dynamic Tool Calling

One config change. Any SDK.

Compounding cost reduction.

Logs, traces, and cost dashboards — built in.

Your CISO says yes. Automatically.

Start with one team.Scale to the whole org.

The gateway between
your apps and AI.

Start with one team.
Scale to the whole org.