Internet infrastructure provider Cloudflare has accused AI startup Perplexity of actively circumventing website blocks and obscuring its identity to scrape content from sites that have explicitly opted out of AI scraping. Cloudflare published research on Monday detailing how Perplexity allegedly ignored established rules and disguised its crawling and scraping activities.
According to Cloudflare’s researchers, Perplexity attempted to obscure its identity by changing its bots’ “user agent,” a signal identifying a website visitor’s device and version type, and altering their autonomous system networks (ASN), which are numbers that identify large networks on the internet. This alleged activity was observed across “tens of thousands of domains and millions of requests per day,” with Cloudflare stating they were able to “fingerprint this crawler using a combination of machine learning and network signals.”
The issue came to Cloudflare’s attention after its customers reported that Perplexity continued to crawl and scrape their sites despite the implementation of rules in their Robots.txt files, a web standard used to instruct search engines and AI companies on which pages can or cannot be indexed, and specific blocks against known Perplexity bots. Cloudflare conducted tests and confirmed that Perplexity was indeed circumventing these blocks. “We observed that Perplexity uses not only their declared user-agent, but also a generic browser intended to impersonate Google Chrome on macOS when their declared crawler was blocked,” Cloudflare stated.
In response, Cloudflare has de-listed Perplexity’s bots from its verified list and implemented new blocking techniques. A Perplexity spokesperson, Jesse Dwyer, dismissed Cloudflare’s blog post as a “sales pitch,” telling TechCrunch that screenshots in the post “show that no content was accessed” and claiming in a follow-up email that the bot named by Cloudflare “isn’t even ours.”
This incident is not the first time Perplexity has faced accusations regarding unauthorized scraping or content usage. Last year, news outlets, including Wired, alleged that Perplexity was plagiarizing their content. Weeks later, during an interview at the Disrupt 2024 conference, Perplexity CEO Aravind Srinivas reportedly struggled to provide his company’s definition of plagiarism when asked directly.
Cloudflare has increasingly taken a public stance against AI crawlers, citing concerns over the impact on the internet’s business model, particularly for publishers. Just last month, Cloudflare launched a marketplace enabling website owners and publishers to charge AI scrapers for accessing their sites. Cloudflare’s chief executive, Matthew Prince, has previously warned that AI is disrupting the internet’s underlying economic structure. Last year, the company also introduced a free tool designed to prevent bots from scraping websites for AI training purposes.




