Skip to content
Comparison

Centinel vs robots.txt

robots.txt tells crawlers to stay away. Centinel makes sure they actually do. 32% of AI scrapes bypass robots.txt. Centinel catches them at the edge.

robots.txt is a plain text file that tells web crawlers which pages they should not access. Compliant crawlers like Googlebot follow it. Many AI crawlers do not. Industry data shows that approximately 32% of AI scraping activity ignores robots.txt directives entirely. Some crawlers spoof their user agent strings, claiming to be standard browsers while running automated collection from data center IPs or residential proxies. robots.txt has no enforcement mechanism — it is a request, not a technical barrier. Centinel adds the enforcement layer that robots.txt lacks. It identifies crawlers regardless of their stated identity using TLS handshake analysis, HTTP/2 frame parameters, and browser JavaScript signals. Centinel blocks identified crawlers at the CDN edge in under 2 milliseconds, before requests reach the origin server. Setup takes under 5 minutes and works with any web server or CDN provider.

Feature
Centinel
robots.txt
Enforced blocking (not voluntary)
Detects fake user agents
Per-crawler monetization
Crawler analytics dashboard
Per-crawler access control
Partial
Real-time request blocking
Cost
Free tier
Free
Setup time
5 min
1 min
Crawler bypass rate
<1%
32%
Maintenance
Automatic
Manual

Why robots.txt isn't enough

robots.txt is a request, not a wall. No enforcement mechanism. Centinel is the enforcement layer robots.txt lacks — identifies crawlers regardless of claimed identity, blocks in under 2ms.

Try Centinel free

Frequently asked questions

Can I use Centinel and robots.txt together?
Yes, and most Centinel customers do. robots.txt handles the compliant crawlers cheaply. Centinel enforces the decision for the 30% that ignore robots.txt (Tollbit Q4 2025) and the crawlers that spoof their user agent entirely. They are complementary layers: a request on robots.txt and a wall behind it.
How much AI crawler traffic bypasses robots.txt today?
Tollbit's Q4 2025 data puts the overall bypass rate at 30% across AI bot scrapes. For specific crawlers it is higher: OpenAI's ChatGPT-User agent bypassed at 42% of sites that explicitly blocked it. Cloudflare found only 7.8% of top domains disallow GPTBot in their robots.txt at all. The practical ceiling on what robots.txt can prevent is well below 100%.
How long does Centinel take to install compared to updating robots.txt?
robots.txt takes thirty seconds. Centinel takes five minutes. The five minutes buy you an enforcement layer that catches the 30%-plus of crawlers robots.txt does not affect. For teams already running a CDN or middleware, the integration is a single configuration block; for teams running bare Next.js, it is an npm install and a middleware export.
Does Centinel respect robots.txt semantics?
Yes, where they make sense. If your robots.txt allows Googlebot, Centinel honors that as an allowlist entry so search indexing is unaffected. If your robots.txt disallows GPTBot, Centinel enforces it at the edge rather than trusting GPTBot to self-restrict. You write the policy once; Centinel is the layer that actually applies it.

Pick the next step that fits where you are

Demo, self-serve check, pricing, or a quiet email. Whichever maps to your stage.