Understand the threats. Make better decisions.
Practical guides on AI crawlers, web scraping, and content protection, written by the team that tracks 1,600+ of them.
Fundamentals
What is an AI crawler?
How AI crawlers differ from traditional search engine bots, what data they collect, and why they matter for your business.
6 min readWhat is web scraping?
The mechanics of web scraping, why companies do it, the legal landscape, and how AI has changed the scraping game.
7 min readTLS fingerprinting explained
How TLS fingerprinting identifies bots by the shape of their handshake. JA3, JA4, and why it catches scrapers that user-agent checks miss.
8 min readPatched Chromium browsers explained
How scrapers patch Chromium's source code to hide automation fingerprints, and why navigator.webdriver checks no longer catch patched browsers.
8 min readWhat is AI agent traffic?
AI agent traffic is a new traffic class — training crawlers, retrieval crawlers, agentic workflows, and spoofed scrapers. How it differs from classic bots and what publishers can do about it.
8 min readPractical guides
How to block AI crawlers
A practical walkthrough of every blocking method, from robots.txt to edge-level detection, with the tradeoffs of each.
8 min readrobots.txt for AI bots: Complete guide
How to configure robots.txt for AI crawlers. Every directive, every major bot, and why robots.txt alone isn't enough.
10 min readHow to detect browser automation beyond user agents
Detection techniques that work when user agents lie: TLS fingerprints, HTTP/2 parameters, CDP artifacts, and behavioral signals.
8 min readWhy an interstitial challenge page is inevitable
Why passive bot detection fails against modern scrapers, and why an interstitial challenge page is the only reliable way to protect content from AI crawlers.
8 min readHow to verify AI agents
A publisher operator’s guide to telling legitimate AI agents from spoofed ones. IP ranges, reverse-DNS, TLS fingerprints, request signing, and the policies that sit on top.
8 min readWhy monetize AI agents, not just block them
Blanket-block leaves money on the table. The three paths every publisher needs — block, verify-and-allow, charge — and the five live monetization mechanisms in 2026-Q2.
9 min readFrequently asked questions
- What exactly is an AI crawler?
- An AI crawler is an automated client that collects web content to train, ground, or operate an AI system. It differs from a search engine crawler in purpose: a search bot indexes your pages to send traffic back, an AI crawler extracts your text to answer questions without referring anyone to the source. Cloudflare measured a 500,000-to-1 crawl-to-referral ratio for Anthropic in 2025.
- Does robots.txt stop AI crawlers?
- Sometimes, and fewer times each quarter. Tollbit reported that 30% of AI bot scrapes in Q4 2025 ignored explicit robots.txt rules, and OpenAI's ChatGPT-User agent bypassed at 42% of sites that blocked it. For the compliant crawlers it still works. For the rest it does nothing. A plain-text request is not an enforcement mechanism.
- Will blocking AI crawlers hurt my SEO?
- No, if you block correctly. Googlebot, Bingbot, and other search indexers use separate user agents from AI training crawlers. Blocking GPTBot or Bytespider does not affect your presence in traditional search results. AI Overviews and similar AI-search surfaces have their own user agents and can be allowed independently. The only risk comes from blocking a verified search bot by accident, which is why allowlists matter.
- What is TLS fingerprinting and why does it matter?
- TLS fingerprinting identifies the software making an HTTPS connection by inspecting the cipher suites and extensions in its handshake. A Python script claiming to be Chrome produces a Python TLS fingerprint because the library shipped with Python, not with Chrome. Cloudflare tracks over 15 million unique JA4 fingerprints daily. A user agent is a string the scraper chose; a TLS fingerprint is a property of the code that is running.
- How much of my site traffic is already bots?
- Imperva's 2025 Bad Bot Report measured automated traffic at 51% of total web traffic in 2024, with 37% classified as bad bots. Cloudflare reported 39% of top 1 million sites are accessed by AI bots specifically, while only 2.98% actively block them. Your number depends on industry and content type, but on a publisher site with archived content the share is usually higher than your analytics admits.
Pick the next step that fits where you are
Demo, self-serve check, pricing, or a quiet email. Whichever maps to your stage.