QuilrAI
Platform · For Engineers

The gateway between
your apps and AI.

One base_url. 200+ LLMs. 150+ MCPs. Identity, guardrails, cost controls, and routing — production-ready under sub-40ms.

200+

LLMs supported

150+

MCP servers

43%

token savings

<40ms

p99 overhead

Identity

JWT/JWKS via Auth0, Okta, Google. Per-user, per-team usage. Domain allowlists.

Guardrails

PII/PHI/PCI detection. Prompt injection. Custom intents. Block, redact, monitor.

Cost

TOON compression cuts inputs 43%. Routing groups split traffic across providers.

Routing

Weighted groups, regional endpoints, automatic failover. One model parameter.

Architecture

One gateway. Two pipelines.

Every LLM call and every MCP tool call routes through QuilrAI — identity, guardrails, cost controls, and routing applied consistently in both directions.

LLM Gateway
guardrails.quilr.ai

7 stages · sequential · both directions

01

Identity & Auth

JWT/JWKS · Auth0/Okta/Google · per-user usage · domain allowlists

02

Rate Limits

Req per min/hr/day · token budgets · per-team and per-model

03

Security Guardrails

PII/PHI/PCI detection · prompt injection · block, redact, monitor

04

Custom Intents

Train your own classifier · block, monitor, or redact matches

05

Prompt Store

Versioned prompts · {{variable}} templates · enforce mode

06

Token Saving

JSON→TOON · HTML/Markdown stripping · responses untouched

07

Request Routing

Weighted routing groups · group-name-as-model · auto-failover

↩ responses scanned on the return path

OpenAIAnthropicAzure OpenAIAWS BedrockVertex AIvLLM / Custom
MCP Gateway
mcp.quilr.ai

6 stages · multiplexed · 1 URL → 150+ servers

01

Auth & Identity

Bearer token · OAuth DCR · per-agent identity verification

02

Agent Access Control

Cursor / Claude / OpenAI auto-detected by User-Agent · per-agent toggles

03

Dynamic Tool Calling

10K → 200

10–20K tokens → ~200 per call · 2× usage · LLM picks the right tool

04

Security Guardrails

Same PII / PHI / PCI guardrails applied to every tool call

05

Web Search Policy

URL filtering via Zscaler ZIA · Prisma Access · FortiGate · Cisco

06

Auth Mediation

Gateway brokers OAuth tokens · agents never see raw credentials

↻ 1 URL routes all MCP tool calls (multiplexed)

SlackGitHubJiraGoogle DriveSnowflakeAWS+144 more

LLM Gateway · 7-stage pipeline

What each stage does for you.

Each stage solves a real engineering problem. They run sequentially on the request, and again on the response.

01Identity & Auth

Know exactly who's using AI and how much

JWT/JWKS · Auth0/Okta/Google · per-user usage · domain allowlists

02Rate Limits

Prevent runaway costs and noisy-neighbor problems

Req per min/hr/day · token budgets · per-team and per-model

03Security Guardrails

Stop sensitive data leaks and prompt attacks, both ways

PII/PHI/PCI detection · prompt injection · block, redact, monitor

04Custom Intents

Block topics unique to your business

Train your own classifier · block, monitor, or redact matches

05Prompt Store

Update prompts across all apps instantly, no deploys

Versioned prompts · {{variable}} templates · enforce mode

06Token Saving

Cut input costs 43% automatically, no code changes

JSON→TOON · HTML/Markdown stripping · responses untouched

07Request Routing

Split traffic across providers and auto-failover

Weighted routing groups · group-name-as-model · auto-failover

Return Path

Responses are scanned on the way back — same guardrails, both directions.

Red Team Testing validates configs before production

MCP Gateway · 1 URL · 150+ servers

Multiplexed MCP. 2× usage on the same tokens.

One URL, every MCP. Dynamic Tool Calling cuts tool selection context from 10–20K tokens to ~200 — doubling productive usage with no config changes.

Dynamic Tool Calling

Featured

Without QuilrAI, every MCP request sends ALL tool descriptions to the LLM — 10–20K tokens just to pick a tool. Dynamic Tool Calling presents only the relevant tools: ~200 tokens.

2× actual work completed from same token budget
Higher accuracy — LLM picks the right tool without noise
Automatic — zero configuration across all MCPs

Before · all tool descriptions

15,000

tokens per tool call

After · relevant tools only

200

tokens per tool call

productive usage

MCP Multiplexing

One URL routes to all 150+ MCPs. No separate URLs per server. Zero agent config changes.

Web Search MCP

Built-in web search with enterprise URL filtering via Zscaler ZIA, FortiGate, Cisco.

150+ MCP Library

Productivity, Dev Tools, Communication, Data, Cloud. One-click install or custom transport URL.

Auth Mediation

Gateway brokers OAuth tokens. Agents never see raw credentials. OAuth→Token, Token→Token, No Auth→OAuth.

Tool Risk Categorization

Read (low) → Write (medium, review first) → Destructive (high, disabled by default). Auto-hidden when off.

Agent Auto-Detection

Cursor, Claude, OpenAI, Gemini auto-detected by User-Agent. Per-agent MCP toggles.

MCP / AI Portal

Self-service portal — end users browse MCPs, connect their accounts via OAuth, start using tools.

Custom MCPs

Bring your own MCP via transport URL. Same identity, guardrails, and audit pipeline applies.

Integration

One config change. Any SDK.

Change base_url. Every OpenAI-compatible SDK works out of the box.

from openai import OpenAI

 

client = OpenAI(

  base_url="https://guardrails.quilr.ai/openai_compatible/",

  api_key="sk-quilr-..."

)

/openai_compatible//anthropic_messages//vertex_ai//sdk/v1/check

SDK Mode

/sdk/v1/check

Standalone content scanning without proxy. Validate inputs before LLM, scan responses, analyze uploads. LiteLLM plugin: pre_call · during_call · post_call.

Identity Aware

JWT/JWKS · Auth0 · Okta

Per-user tracking via JWT, X-User-Email header, or static PEM. Domain restrictions. JWT claims enforcement. Identity enforcement mode blocks unauth calls.

Regional Endpoints

Auto · USA · India

guardrails.quilr.ai → guardrails-usa-1 → guardrails-india-1. Auto-failover chain. Sub-40ms overhead in all regions.

Compatible with:OpenAI SDKAnthropic SDKLangChainCrewAIAutoGenLiteLLMCursorClaude CodeVercel AI SDK

Cost & Routing

Compounding cost reduction.

Dynamic Tool Calling (2× usage) + Token Saving (43%) + Smart Routing. Massive cost reduction without changing your code.

Token Saving · 43% average

Input tokens compressed automatically. Responses untouched.

Before · JSON2.4M tokens · $312
{"name": "John", "age": 30, "city": "NYC"}
After · TOON1.4M tokens · $178
name:John|age:30|city:NYC
43%fewer tokens · responses untouched

Routing Group · "production"

Use the group name as your model parameter. Gateway distributes by weight, fails over automatically.

OpenAIgpt-4o
40%
Anthropicclaude-sonnet-4
35%
Azure OpenAIgpt-4o
25%

model="production"

→ gateway routes by weight → auto-failover on outage

Regional USRegional IndiaBedrockVertex AIvLLM

<40ms

Gateway overhead

All 7 stages, p99

99.6%

Uptime SLA

Across all regions

2–5%

Faster responses

Connection pooling

3

Regional endpoints

Auto · USA · India

Observability

Logs, traces, and cost dashboards — built in.

Every request, every guardrail decision, every tool call — logged, attributed, and replayable. No separate tracing setup.

Per-user analytics

Token usage, cost, latency, and error rates broken down by user, team, model, and route.

Full request logs

Every prompt, response, tool call, and guardrail decision. Searchable. Exportable.

Trace replay

Reproduce any request end-to-end. See which stage triggered. Debug in seconds.

Real-time alerts

Webhook on guardrail block, budget breach, latency spike, or provider outage.

Live · Last 24h
app.quilr.ai/observability

Requests

1.42M

+18% vs prev 24h

Tokens used

847.3M

+12% vs prev 24h

Avg p50

612ms

−4% vs prev 24h

Guardrail blocks

412

+9% vs prev 24h

Security gets a lot for free

Your CISO says yes. Automatically.

Every pipeline stage generates governance, audit trails, and compliance coverage — no separate security review.

See the security view
PII / PHI / PCI / financial — block, redact, anonymize, or monitor
Prompt injection & jailbreak detection on every call
Custom intents — train your own content classifiers
Endpoint Agent — TLS inspection + DLP on macOS & Windows
Claude.ai compliance API — sync orgs, users, chats, DLP
Red Team Testing — validate configs against adversarial prompts

Get started

Start with one team.
Scale to the whole org.

Free sandbox. Works with your existing SDKs. 150+ MCPs ready to go. Live in minutes, not months.

Free sandbox · Works with your stack · No vendor lock-in