Feature

Traffic Governance

Control how your infrastructure responds to AI traffic. Prioritise high-quality agents, throttle noise, and ensure your human visitors are never affected — automatically.

AI Traffic is Bursty and Unpredictable

A single AI agent session can trigger dozens of tool calls in seconds — fetching pricing, checking availability, comparing tiers. Multiply this across concurrent agent sessions and your load profile looks like a DDoS, even when the intent is entirely legitimate.

Traditional rate limiting — "100 requests/minute per IP" — either blocks real agents or fails to protect against bursts. Averence's Traffic Governance layer handles this at the intent level, not just the request level.

Grade-Based Token Bucket Rate Limiting

Averence uses a token bucket algorithm applied per visitor grade — not per raw IP. Each traffic class gets a different bucket size and refill rate:

Traffic GradeBucket SizeRefill RateBurst Allowance
A — Verified HumanUnlimited
B — Known AI Agent200 tokens60/min50 above bucket
C — Unverified Bot20 tokens5/minNone
F — Noise / Scraper5 tokens1/minNone

When a bucket is exhausted, Averence returns a 429 with a Retry-After header — never silently drops the request.

HTTP/1.1 429 Too Many Requests
Retry-After: 47
X-RateLimit-Limit: 60
X-RateLimit-Remaining: 0

{
  "error": "rate_limit_exceeded",
  "message": "Token bucket empty. Retry after 47 seconds.",
  "grade": "C"
}

Graceful Degradation, Not Hard Blocks

When your backend is under stress, Averence can progressively reduce response richness instead of returning errors. The right level is applied automatically based on backend health signals:

Level 1 — Full ResponseAll fields: detailed specs, related items, rich metadata
Level 2 — Standard ResponseCore fields only: name, price, availability, URL
Level 3 — Minimal ResponseIdentifier, price, in-stock flag only
Level 4 — Cached FallbackLast known good response, with stale: true flag

Collapsing Redundant Agent Queries

AI agents frequently issue semantically equivalent queries with slight wording variations. Averence detects these and serves the cached response — without charging a rate limit token. Dedup events are logged for analytics so you can see how often agents re-query the same information.

Semantic similarity is computed using an embedding model. Requests with cosine similarity > 0.92 within the same session window are treated as duplicates.