Anti-bot Anti-bot Infrastructure Security

Anti-bot detection

Anti-bot detection refers to the systems deployed by websites to distinguish human visitors from automated scrapers and bots. In 2026, the major commercial anti-bot vendors — Cloudflare Turnstile, DataDome, PerimeterX (now HUMAN Security), Akamai Bot Manager, and Kasada — collectively protect a significant proportion of high-value web targets: e-commerce sites, social networks, travel platforms, and financial data sources.

Understanding how detection works is necessary for understanding why some scrapers succeed on a target and others don’t — and why the managed API vendors have wildly different success rates on protected vs unprotected targets.

Detection signals (in order of sophistication)

1. IP reputation

The simplest check: is this IP address known to be associated with scrapers, bots, or datacenter hosting? Services like IPQualityScore, MaxMind, and vendor-internal databases flag IPs that have previously exhibited scraping patterns. Datacenter IPs from AWS, GCP, and Azure are automatically suspect. Residential proxy IPs are less so — they look like regular home internet connections.

2. HTTP header analysis

A real Chrome browser sends a specific set of headers in a specific order. An automated request sent via Python requests library sends different headers — unless you explicitly set them to match:

Python headers that look like real Chrome

headers = {
  'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36',
  'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8',
  'Accept-Language': 'en-US,en;q=0.5',
  'Accept-Encoding': 'gzip, deflate, br',
  'Connection': 'keep-alive',
  'Upgrade-Insecure-Requests': '1',
  'Sec-Fetch-Dest': 'document',
  'Sec-Fetch-Mode': 'navigate',
  'Sec-Fetch-Site': 'none',
  'Sec-Fetch-User': '?1',
}
# Anti-bot systems also check header *order*, not just presence

Anti-bot systems check that headers match known browser fingerprints — and that they appear in the right order, since different libraries present headers differently.

3. TLS fingerprinting

The TLS/SSL handshake includes a cipher suite order and extension list that varies by client. Python’s requests library has a distinctive JA3/JA4 TLS fingerprint. Cloudflare and DataDome maintain databases of known-bot TLS fingerprints. This is why tools like curl-impersonate and tls-client exist — they impersonate specific browser TLS fingerprints.

4. Browser fingerprinting (canvas, WebGL, fonts)

When a headless browser (Playwright, Puppeteer, Selenium) renders a page, JavaScript can read canvas rendering outputs, WebGL parameters, installed fonts, and screen dimensions. These values differ subtly between headless and real browsers — even the same version of Chrome. The puppeteer-stealth plugin and Playwright’s built-in stealth mode patch these differences.

5. Behavioral analysis

The most sophisticated detection: does this user’s mouse movement, scroll speed, click timing, and navigation pattern look human? Behavioral analysis operates continuously during a session. Solutions like PerimeterX (HUMAN Security) analyze hundreds of data points per second in real-time JavaScript embedded in the page.

Why this matters for tool selection

The success rate difference between managed scraping APIs on the same protected target reflects how well each vendor’s stack addresses these detection signals:

Detection layer	ScraperAPI coverage	Zyte coverage	Bright Data coverage
IP reputation	Residential proxies (premium)	Residential + tuned pool	72M residential network
Header fingerprint	Auto-set per request	Full browser emulation	Full browser emulation
TLS fingerprint	Basic	Advanced (Scrapy + custom)	Advanced
Browser fingerprint	Partial (JS rendering)	Full headless stealth	Full headless stealth
Behavioral analysis	Limited	Partial	Partial

This is why Zyte’s success rate on Akamai-protected targets (94.3%) exceeds ScraperAPI’s (71%) — better coverage of the full detection stack, not just IP rotation.

The compliance angle

Anti-bot detection is not just a technical problem — it’s also a legal one. Terms of Service on most commercial sites explicitly prohibit automated access. Bypassing anti-bot detection to scrape a site may violate the TOS, which since the 2022 hiQ v. LinkedIn ruling is enforceable under contract law in US courts. Read the web scraping legality guide before scraping protected commercial targets at scale.

Residential proxy — higher-trust IPs for bypassing IP reputation checks
CAPTCHA solver — automated CAPTCHA completion
Browser fingerprinting — how headless browsers are detected

Related concepts

→ captcha solver → residential proxy → fingerprinting browser → cloudflare turnstile