Anti-bot detection
Anti-bot detection refers to the systems deployed by websites to distinguish human visitors from automated scrapers and bots. In 2026, the major commercial anti-bot vendors — Cloudflare Turnstile, DataDome, PerimeterX (now HUMAN Security), Akamai Bot Manager, and Kasada — collectively protect a significant proportion of high-value web targets: e-commerce sites, social networks, travel platforms, and financial data sources.
Understanding how detection works is necessary for understanding why some scrapers succeed on a target and others don’t — and why the managed API vendors have wildly different success rates on protected vs unprotected targets.
Detection signals (in order of sophistication)
1. IP reputation
The simplest check: is this IP address known to be associated with scrapers, bots, or datacenter hosting? Services like IPQualityScore, MaxMind, and vendor-internal databases flag IPs that have previously exhibited scraping patterns. Datacenter IPs from AWS, GCP, and Azure are automatically suspect. Residential proxy IPs are less so — they look like regular home internet connections.
2. HTTP header analysis
A real Chrome browser sends a specific set of headers in a specific order. An automated request sent via Python requests library sends different headers — unless you explicitly set them to match:
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8',
'Accept-Language': 'en-US,en;q=0.5',
'Accept-Encoding': 'gzip, deflate, br',
'Connection': 'keep-alive',
'Upgrade-Insecure-Requests': '1',
'Sec-Fetch-Dest': 'document',
'Sec-Fetch-Mode': 'navigate',
'Sec-Fetch-Site': 'none',
'Sec-Fetch-User': '?1',
}
# Anti-bot systems also check header *order*, not just presence Anti-bot systems check that headers match known browser fingerprints — and that they appear in the right order, since different libraries present headers differently.
3. TLS fingerprinting
The TLS/SSL handshake includes a cipher suite order and extension list that varies by client. Python’s requests library has a distinctive JA3/JA4 TLS fingerprint. Cloudflare and DataDome maintain databases of known-bot TLS fingerprints. This is why tools like curl-impersonate and tls-client exist — they impersonate specific browser TLS fingerprints.
4. Browser fingerprinting (canvas, WebGL, fonts)
When a headless browser (Playwright, Puppeteer, Selenium) renders a page, JavaScript can read canvas rendering outputs, WebGL parameters, installed fonts, and screen dimensions. These values differ subtly between headless and real browsers — even the same version of Chrome. The puppeteer-stealth plugin and Playwright’s built-in stealth mode patch these differences.
5. Behavioral analysis
The most sophisticated detection: does this user’s mouse movement, scroll speed, click timing, and navigation pattern look human? Behavioral analysis operates continuously during a session. Solutions like PerimeterX (HUMAN Security) analyze hundreds of data points per second in real-time JavaScript embedded in the page.
Why this matters for tool selection
The success rate difference between managed scraping APIs on the same protected target reflects how well each vendor’s stack addresses these detection signals:
| Detection layer | ScraperAPI coverage | Zyte coverage | Bright Data coverage |
|---|---|---|---|
| IP reputation | Residential proxies (premium) | Residential + tuned pool | 72M residential network |
| Header fingerprint | Auto-set per request | Full browser emulation | Full browser emulation |
| TLS fingerprint | Basic | Advanced (Scrapy + custom) | Advanced |
| Browser fingerprint | Partial (JS rendering) | Full headless stealth | Full headless stealth |
| Behavioral analysis | Limited | Partial | Partial |
This is why Zyte’s success rate on Akamai-protected targets (94.3%) exceeds ScraperAPI’s (71%) — better coverage of the full detection stack, not just IP rotation.
The compliance angle
Anti-bot detection is not just a technical problem — it’s also a legal one. Terms of Service on most commercial sites explicitly prohibit automated access. Bypassing anti-bot detection to scrape a site may violate the TOS, which since the 2022 hiQ v. LinkedIn ruling is enforceable under contract law in US courts. Read the web scraping legality guide before scraping protected commercial targets at scale.
Related concepts
- Residential proxy — higher-trust IPs for bypassing IP reputation checks
- CAPTCHA solver — automated CAPTCHA completion
- Browser fingerprinting — how headless browsers are detected