Traffic Governance
Control how your infrastructure responds to AI traffic. Prioritise high-quality agents, throttle noise, and ensure your human visitors are never affected — automatically.
AI Traffic is Bursty and Unpredictable
A single AI agent session can trigger dozens of tool calls in seconds — fetching pricing, checking availability, comparing tiers. Multiply this across concurrent agent sessions and your load profile looks like a DDoS, even when the intent is entirely legitimate.
Traditional rate limiting — "100 requests/minute per IP" — either blocks real agents or fails to protect against bursts. Averence's Traffic Governance layer handles this at the intent level, not just the request level.
Grade-Based Token Bucket Rate Limiting
Averence uses a token bucket algorithm applied per visitor grade — not per raw IP. Each traffic class gets a different bucket size and refill rate:
| Traffic Grade | Bucket Size | Refill Rate | Burst Allowance |
|---|---|---|---|
| A — Verified Human | Unlimited | — | — |
| B — Known AI Agent | 200 tokens | 60/min | 50 above bucket |
| C — Unverified Bot | 20 tokens | 5/min | None |
| F — Noise / Scraper | 5 tokens | 1/min | None |
When a bucket is exhausted, Averence returns a 429 with a Retry-After header — never silently drops the request.
HTTP/1.1 429 Too Many Requests
Retry-After: 47
X-RateLimit-Limit: 60
X-RateLimit-Remaining: 0
{
"error": "rate_limit_exceeded",
"message": "Token bucket empty. Retry after 47 seconds.",
"grade": "C"
}
Graceful Degradation, Not Hard Blocks
When your backend is under stress, Averence can progressively reduce response richness instead of returning errors. The right level is applied automatically based on backend health signals:
stale: true flagCollapsing Redundant Agent Queries
AI agents frequently issue semantically equivalent queries with slight wording variations. Averence detects these and serves the cached response — without charging a rate limit token. Dedup events are logged for analytics so you can see how often agents re-query the same information.
Semantic similarity is computed using an embedding model. Requests with cosine similarity > 0.92 within the same session window are treated as duplicates.