Feature

Traffic Governance

Control how your infrastructure responds to AI traffic. Prioritise high-quality agents, throttle noise, and ensure your human visitors are never affected — automatically.

The Challenge

AI Traffic is Bursty and Unpredictable

A single AI agent session can trigger dozens of tool calls in seconds — fetching pricing, checking availability, comparing tiers. Multiply this across concurrent agent sessions and your load profile looks like a DDoS, even when the intent is entirely legitimate.

Traditional rate limiting — "100 requests/minute per IP" — either blocks real agents or fails to protect against bursts. Averence's Traffic Governance layer handles this at the intent level, not just the request level.

Rate Limiting

Grade-Based Token Bucket Rate Limiting

Averence uses a token bucket algorithm applied per visitor grade — not per raw IP. Each traffic class gets a different bucket size and refill rate:

Traffic Grade	Bucket Size	Refill Rate	Burst Allowance
A — Verified Human	Unlimited	—	—
B — Known AI Agent	200 tokens	60/min	50 above bucket
C — Unverified Bot	20 tokens	5/min	None
F — Noise / Scraper	5 tokens	1/min	None

When a bucket is exhausted, Averence returns a 429 with a Retry-After header — never silently drops the request.

HTTP/1.1 429 Too Many Requests
Retry-After: 47
X-RateLimit-Limit: 60
X-RateLimit-Remaining: 0

{
  "error": "rate_limit_exceeded",
  "message": "Token bucket empty. Retry after 47 seconds.",
  "grade": "C"
}

Response Shaping

Graceful Degradation, Not Hard Blocks

When your backend is under stress, Averence can progressively reduce response richness instead of returning errors. The right level is applied automatically based on backend health signals:

Level 1 — Full ResponseAll fields: detailed specs, related items, rich metadata

Level 2 — Standard ResponseCore fields only: name, price, availability, URL

Level 3 — Minimal ResponseIdentifier, price, in-stock flag only

Level 4 — Cached FallbackLast known good response, with stale: true flag

Semantic Deduplication

Collapsing Redundant Agent Queries

AI agents frequently issue semantically equivalent queries with slight wording variations. Averence detects these and serves the cached response — without charging a rate limit token. Dedup events are logged for analytics so you can see how often agents re-query the same information.

Semantic similarity is computed using an embedding model. Requests with cosine similarity > 0.92 within the same session window are treated as duplicates.

Request Partner Access → ← AI Visitor Intelligence