QuilrAI
For Engineering & IT

One line of code.
Every AI call governed.

Change your base URL. Identity, guardrails, rate limits, token saving, and routing — all enforced automatically. Works with every provider.

integration.py
# Before
base_url = "https://api.openai.com/v1"
# After — one line, full governance
base_url = "https://guardrails.quilr.ai/openai_compatible/"
/openai_compatible/OpenAI Compatible
/anthropic_messages/Anthropic Messages
/vertex_ai/Vertex AI
/sdk/v1/checkSDK Mode
~40ms
Overhead
99.6%
SLA
43%
Token savings
150+
MCP servers
5
Providers
1 line
To integrate

Three Ways to Integrate

OpenAI-compatible proxy, native Anthropic support, or SDK mode for any provider.

/openai_compatible/

OpenAI Compatible

Drop-in replacement for any OpenAI SDK call. Chat completions, embeddings, assistants — all governed.

Supported Models
GPT-4oGPT-4o-miniGPT-4 Turboo1o1-mini
example.py
from openai import OpenAI
 
client = OpenAI(
base_url="https://guardrails.quilr.ai/openai_compatible/",
# Pass your Quilr token via auth header
)
 
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "..."}]
)
Regional Endpoints
Autoguardrails.quilr.aiAutomatic routing
USAus.guardrails.quilr.aiUS East (Virginia)
Indiain.guardrails.quilr.aiAsia South (Mumbai)
Architecture

One Gateway. Every Connection Governed.

AI systems on the left. Tools and providers on the right. QuilrAI sits in the middle, every LLM call and MCP tool invocation passes through the Decision Engine.

── AI Systems ──

OpenAIGPT-4o, o1
AnthropicClaude Sonnet/Opus
GoogleGemini Pro/Ultra
Self-HostedLlama, Mistral, vLLM
Cursor
Claude Code
OpenAI Agents
Custom Agents
QuilrAI

LLM Gateway + MCP Gateway

Identity & Auth
Security Guardrails
Guardian Agent
Decision Engine
~40ms
overhead
150+
MCP servers

── LLM Providers ──

OpenAIAnthropicAzureBedrockVertex AIvLLM

── Tools & MCP Servers ──

GitHub
Slack
Jira
PostgreSQL
Google Drive
AWS
Brave Search
Salesforce
MongoDB

+ 140 more servers

1AI System connects

Any model or agent connects via one base_url change. OpenAI-compatible, Anthropic, Vertex AI, or MCP.

2QuilrAI governs

Every call passes through Identity, Guardrails, Guardian Agent, and the Decision Engine. ~40ms overhead.

3Tools & providers execute

Approved calls route to 5+ LLM providers or 150+ MCP servers. Automatic failover. Token optimization.

Pipeline Architecture

Every request passes through a multi-stage pipeline. Toggle between LLM and MCP views.

1
Identity & Auth
Know exactly who’s using AI and how much
JWT/JWKS (Auth0, Okta, Google) · per-user usage tracking · domain allowlists
2
Rate Limits
Prevent runaway costs and noisy-neighbor problems
Requests per min/hr/day · token budgets · per-team and per-model limits
3
Security Guardrails
Stop sensitive data leaks and prompt attacks — both directions
PII/PHI/PCI detection · prompt injection blocking · block, redact, anonymize, or monitor
4
Custom Intents
Block topics unique to your business (e.g. competitor mentions, legal risk)
Train your own classifier with examples · block, monitor, or redact matches
5
Prompt Store
Update prompts across all apps instantly — no code deploys
Centralized versioned prompts · {{variable}} templates · enforce mode rejects freeform
6
Token Saving
Cut input costs 43% automatically — no code changes
JSON→TOON compression · HTML/Markdown stripping · responses untouched
7
Request Routing
Split traffic across providers and auto-failover when one goes down
Weighted routing groups · group-name-as-model-parameter · automatic failover

MCP Gateway

150+ managed MCP servers via one URL. Dynamic Tool Calling. Auto-detected agents.

MCP Multiplexing

One single URL for all MCPs. Agents connect to one endpoint — the gateway routes to the right server based on the tool call.

mcp.quilr.ai/mcp/<slug>/

Dynamic Tool Calling

Reduces tool selection context from 10-20K tokens to ~200 tokens. Higher accuracy — LLMs pick the right tool without noise.

2x productive usage with same tokens

Web Search MCP

Built-in web search with enterprise security gateway integration. URL filtering enforced through your existing security stack.

Zscaler ZIAPrisma AccessFortiGateCisco Umbrella

MCP Library — 150+ Servers

Developer Tools
GitHub
GitLab
Jira
Linear
Sentry
Communication
Slack
Discord
Email
Teams
Databases
PostgreSQL
MongoDB
Redis
Supabase
Cloud
AWS
GCP
Azure
Cloudflare
Productivity
Google Drive
Notion
Confluence
Asana
Data & Analytics
BigQuery
Snowflake
Tableau
Web Search
Brave Search
Google Search
Bing
Security
Zscaler ZIA
Prisma Access
FortiGate
File Systems
Local FS
S3
GCS
Dropbox

Tool Risk Categorization

Read OnlySafe operations — no state changes
get_filelist_repossearch_issuesread_channel
WriteCreates or modifies resources
create_issuesend_messageupdate_recordpush_commit
DestructiveIrreversible operations — requires approval
delete_repodrop_tableremove_userpurge_cache

Auto-Detected Agents

Cursor
User-Agent: cursor/*
Claude Code
User-Agent: claude-code/*
OpenAI Agents
User-Agent: openai-agents/*
Gemini
User-Agent: gemini-cli/*

Agents are automatically identified via User-Agent header matching. Per-agent policies, rate limits, and tool access controls apply instantly. Add custom agents with your own keywords.

MCP/AI Portal

Self-service portal for end users to browse available MCPs, connect their accounts via OAuth, and start using tools. Not admin-only — engineers can self-serve.

Auth Mediation

Gateway brokers OAuth tokens — agents never see raw credentials. Modes: OAuth→Token, Token→Token, No Auth→OAuth. Bearer token + mcpuser header, OAuth DCR, OAuth Manual.

Built for Production

Cost control, intelligent routing, prompt management, and custom classifiers — all built in.

Routing Groups

Weighted distribution across providers with automatic fallback

OpenAIgpt-4o
40%
Anthropicclaude-sonnet-4
35%
Azure OpenAIgpt-4o
25%
Automatic failover: if one provider goes down, traffic shifts to the rest
+ Bedrock, Vertex AI, vLLM / custom endpoints

Token Saving — JSON → TOON

Lossless compression cuts token count by 43%

JSON (before)
{
"messages": [
{"role": "system", "content": "You are a helpful assistant..."},
{"role": "user", "content": "Summarize the Q3 report"}
],
"model": "gpt-4o",
"temperature": 0.7,
"max_tokens": 2048
}
TOON (after)
m:[
{r:"s",c:"You are a helpful assistant..."},
{r:"u",c:"Summarize the Q3 report"}
],M:"gpt-4o",t:0.7,x:2048
43%fewer tokens

Prompt Store

Versioned, centralized prompts with variable injection. No code deploys.

quilrai-prompt-store-summarize-v3
You are a {{role}} at {{company}}.
Summarize the following document in {{format}} format.
Focus on: {{focus_areas}}
Max length: {{max_words}} words.
{{role}}{{company}}{{format}}{{focus_areas}}{{max_words}}

Custom Intents

Train classifiers with examples. Block, monitor, or redact matches.

competitor_mentionblock
Positive Examples
"How does QuilrAI compare to Prompt Security?"
"What advantages does Protect AI have over us?"
Negative Examples
"What are our product advantages?"
"How is our security posture?"
~40ms
Overhead
99.6%
SLA
43%
Token savings
150+
MCP servers
5
Providers
1 line
To integrate

Start building securely

One line change. Full governance. Deploy in minutes.