The gateway between
your apps and AI.
One base_url. 200+ LLMs. 150+ MCPs. Identity, guardrails, cost controls, and routing — production-ready under sub-40ms.
200+
LLMs supported
150+
MCP servers
43%
token savings
<40ms
p99 overhead
# One line. Same SDK.
from openai import OpenAI
client = OpenAI(
base_url="https://guardrails.quilr.ai/openai_compatible/",
api_key="sk-quilr-..."
)
# Every call now passes through:
# Identity → Rate Limits → Guardrails →
# Intents → Prompts → Token Saving → Routing
Identity
JWT/JWKS via Auth0, Okta, Google. Per-user, per-team usage. Domain allowlists.
Guardrails
PII/PHI/PCI detection. Prompt injection. Custom intents. Block, redact, monitor.
Cost
TOON compression cuts inputs 43%. Routing groups split traffic across providers.
Routing
Weighted groups, regional endpoints, automatic failover. One model parameter.
Architecture
One gateway. Two pipelines.
Every LLM call and every MCP tool call routes through QuilrAI — identity, guardrails, cost controls, and routing applied consistently in both directions.
7 stages · sequential · both directions
Identity & Auth
JWT/JWKS · Auth0/Okta/Google · per-user usage · domain allowlists
Rate Limits
Req per min/hr/day · token budgets · per-team and per-model
Security Guardrails
PII/PHI/PCI detection · prompt injection · block, redact, monitor
Custom Intents
Train your own classifier · block, monitor, or redact matches
Prompt Store
Versioned prompts · {{variable}} templates · enforce mode
Token Saving
JSON→TOON · HTML/Markdown stripping · responses untouched
Request Routing
Weighted routing groups · group-name-as-model · auto-failover
↩ responses scanned on the return path
6 stages · multiplexed · 1 URL → 150+ servers
Auth & Identity
Bearer token · OAuth DCR · per-agent identity verification
Agent Access Control
Cursor / Claude / OpenAI auto-detected by User-Agent · per-agent toggles
Dynamic Tool Calling
10K → 20010–20K tokens → ~200 per call · 2× usage · LLM picks the right tool
Security Guardrails
Same PII / PHI / PCI guardrails applied to every tool call
Web Search Policy
URL filtering via Zscaler ZIA · Prisma Access · FortiGate · Cisco
Auth Mediation
Gateway brokers OAuth tokens · agents never see raw credentials
↻ 1 URL routes all MCP tool calls (multiplexed)
LLM Gateway · 7-stage pipeline
What each stage does for you.
Each stage solves a real engineering problem. They run sequentially on the request, and again on the response.
Know exactly who's using AI and how much
JWT/JWKS · Auth0/Okta/Google · per-user usage · domain allowlists
Prevent runaway costs and noisy-neighbor problems
Req per min/hr/day · token budgets · per-team and per-model
Stop sensitive data leaks and prompt attacks, both ways
PII/PHI/PCI detection · prompt injection · block, redact, monitor
Block topics unique to your business
Train your own classifier · block, monitor, or redact matches
Update prompts across all apps instantly, no deploys
Versioned prompts · {{variable}} templates · enforce mode
Cut input costs 43% automatically, no code changes
JSON→TOON · HTML/Markdown stripping · responses untouched
Split traffic across providers and auto-failover
Weighted routing groups · group-name-as-model · auto-failover
Responses are scanned on the way back — same guardrails, both directions.
Red Team Testing validates configs before production
MCP Gateway · 1 URL · 150+ servers
Multiplexed MCP. 2× usage on the same tokens.
One URL, every MCP. Dynamic Tool Calling cuts tool selection context from 10–20K tokens to ~200 — doubling productive usage with no config changes.
Dynamic Tool Calling
FeaturedWithout QuilrAI, every MCP request sends ALL tool descriptions to the LLM — 10–20K tokens just to pick a tool. Dynamic Tool Calling presents only the relevant tools: ~200 tokens.
Before · all tool descriptions
15,000
tokens per tool call
After · relevant tools only
200
tokens per tool call
MCP Multiplexing
One URL routes to all 150+ MCPs. No separate URLs per server. Zero agent config changes.
Web Search MCP
Built-in web search with enterprise URL filtering via Zscaler ZIA, FortiGate, Cisco.
150+ MCP Library
Productivity, Dev Tools, Communication, Data, Cloud. One-click install or custom transport URL.
Auth Mediation
Gateway brokers OAuth tokens. Agents never see raw credentials. OAuth→Token, Token→Token, No Auth→OAuth.
Tool Risk Categorization
Read (low) → Write (medium, review first) → Destructive (high, disabled by default). Auto-hidden when off.
Agent Auto-Detection
Cursor, Claude, OpenAI, Gemini auto-detected by User-Agent. Per-agent MCP toggles.
MCP / AI Portal
Self-service portal — end users browse MCPs, connect their accounts via OAuth, start using tools.
Custom MCPs
Bring your own MCP via transport URL. Same identity, guardrails, and audit pipeline applies.
Integration
One config change. Any SDK.
Change base_url. Every OpenAI-compatible SDK works out of the box.
from openai import OpenAI
client = OpenAI(
base_url="https://guardrails.quilr.ai/openai_compatible/",
api_key="sk-quilr-..."
)
SDK Mode
/sdk/v1/checkStandalone content scanning without proxy. Validate inputs before LLM, scan responses, analyze uploads. LiteLLM plugin: pre_call · during_call · post_call.
Identity Aware
JWT/JWKS · Auth0 · OktaPer-user tracking via JWT, X-User-Email header, or static PEM. Domain restrictions. JWT claims enforcement. Identity enforcement mode blocks unauth calls.
Regional Endpoints
Auto · USA · Indiaguardrails.quilr.ai → guardrails-usa-1 → guardrails-india-1. Auto-failover chain. Sub-40ms overhead in all regions.
Cost & Routing
Compounding cost reduction.
Dynamic Tool Calling (2× usage) + Token Saving (43%) + Smart Routing. Massive cost reduction without changing your code.
Token Saving · 43% average
Input tokens compressed automatically. Responses untouched.
Routing Group · "production"
Use the group name as your model parameter. Gateway distributes by weight, fails over automatically.
model="production"
→ gateway routes by weight → auto-failover on outage
<40ms
Gateway overhead
All 7 stages, p99
99.6%
Uptime SLA
Across all regions
2–5%
Faster responses
Connection pooling
3
Regional endpoints
Auto · USA · India
Observability
Logs, traces, and cost dashboards — built in.
Every request, every guardrail decision, every tool call — logged, attributed, and replayable. No separate tracing setup.
Per-user analytics
Token usage, cost, latency, and error rates broken down by user, team, model, and route.
Full request logs
Every prompt, response, tool call, and guardrail decision. Searchable. Exportable.
Trace replay
Reproduce any request end-to-end. See which stage triggered. Debug in seconds.
Real-time alerts
Webhook on guardrail block, budget breach, latency spike, or provider outage.
Requests
1.42M
+18% vs prev 24h
Tokens used
847.3M
+12% vs prev 24h
Avg p50
612ms
−4% vs prev 24h
Guardrail blocks
412
+9% vs prev 24h
Security gets a lot for free
Your CISO says yes. Automatically.
Every pipeline stage generates governance, audit trails, and compliance coverage — no separate security review.
See the security viewGet started
Start with one team.
Scale to the whole org.
Free sandbox. Works with your existing SDKs. 150+ MCPs ready to go. Live in minutes, not months.
Free sandbox · Works with your stack · No vendor lock-in