Centinel vs robots.txt
robots.txt tells crawlers to stay away. Centinel makes sure they actually do. 32% of AI scrapes bypass robots.txt. Centinel catches them at the edge.
robots.txt is a plain text file that tells web crawlers which pages they should not access. Compliant crawlers like Googlebot follow it. Many AI crawlers do not. Industry data shows that approximately 32% of AI scraping activity ignores robots.txt directives entirely. Some crawlers spoof their user agent strings, claiming to be standard browsers while running automated collection from data center IPs or residential proxies. robots.txt has no enforcement mechanism — it is a request, not a technical barrier. Centinel adds the enforcement layer that robots.txt lacks. It identifies crawlers regardless of their stated identity using TLS handshake analysis, HTTP/2 frame parameters, and browser JavaScript signals. Centinel blocks identified crawlers at the CDN edge in under 2 milliseconds, before requests reach the origin server. Setup takes under 5 minutes and works with any web server or CDN provider.
Why robots.txt isn't enough
robots.txt is a request, not a wall. No enforcement mechanism. Centinel is the enforcement layer robots.txt lacks — identifies crawlers regardless of claimed identity, blocks in under 2ms.
Try Centinel freeFrequently asked questions
- Can I use Centinel and robots.txt together?
- Yes, and most Centinel customers do. robots.txt handles the compliant crawlers cheaply. Centinel enforces the decision for the 30% that ignore robots.txt (Tollbit Q4 2025) and the crawlers that spoof their user agent entirely. They are complementary layers: a request on robots.txt and a wall behind it.
- How much AI crawler traffic bypasses robots.txt today?
- Tollbit's Q4 2025 data puts the overall bypass rate at 30% across AI bot scrapes. For specific crawlers it is higher: OpenAI's ChatGPT-User agent bypassed at 42% of sites that explicitly blocked it. Cloudflare found only 7.8% of top domains disallow GPTBot in their robots.txt at all. The practical ceiling on what robots.txt can prevent is well below 100%.
- How long does Centinel take to install compared to updating robots.txt?
- robots.txt takes thirty seconds. Centinel takes five minutes. The five minutes buy you an enforcement layer that catches the 30%-plus of crawlers robots.txt does not affect. For teams already running a CDN or middleware, the integration is a single configuration block; for teams running bare Next.js, it is an npm install and a middleware export.
- Does Centinel respect robots.txt semantics?
- Yes, where they make sense. If your robots.txt allows Googlebot, Centinel honors that as an allowlist entry so search indexing is unaffected. If your robots.txt disallows GPTBot, Centinel enforces it at the edge rather than trusting GPTBot to self-restrict. You write the policy once; Centinel is the layer that actually applies it.
Pick the next step that fits where you are
Demo, self-serve check, pricing, or a quiet email. Whichever maps to your stage.