Published on

Bots Have Taken Over the Internet

Authors

The Web No Longer Belongs to Us

There is a number that should stop you cold: 51%.

That is the share of all web traffic in 2024 that was generated not by human beings, but by automated bots — according to the 2025 Imperva Bad Bot Report. Before that figure can do its intended work, a methodological note is not just warranted — it is required. The Imperva figure measures HTTP request volume across the Imperva global network, which blocked 13 trillion bad bot requests across thousands of domains in 2024. It does not measure pageviews, bytes transferred, or unique sessions. It covers Imperva's customer footprint, not a statistically representative sample of the entire global internet. And "automated traffic" in this context includes everything from search engine crawlers to malicious scrapers — not just the bad actors. Statista's independent analysis corroborates the directional signal, noting that fraudulent traffic increased 12% year over year, though Statista's methodology differs from Imperva's and the two figures are not directly comparable. What the reports share is a consistent direction: automated traffic is large, growing, and accelerating.

A brief taxonomy is necessary before going further, because the numbers in this piece refer to distinct categories and conflating them produces confusion. Throughout this article, three terms carry specific meanings:

  • Good bots: search engine crawlers, accessibility tools, uptime monitors, legitimate partner integrations.
  • AI bots: scrapers and agents associated with AI training pipelines, inference retrieval, and agentic AI systems.
  • Malicious bots: those engaged in fraud, credential stuffing, DDoS amplification, account takeover, and similar attacks.

These categories overlap in practice — an AI scraper can also be malicious — but the distinction matters when evaluating statistics. The 51% figure encompasses all three. The 37% figure for malicious bot traffic, which increased from 32% in 2023 and marks the sixth consecutive year of growth in bad bot activity, is a subset of that 51% — not a separate axis of measurement. Together, good bots and malicious bots account for the majority of automated traffic; AI bots, while the fastest-growing category, currently represent a smaller but rapidly expanding share.

This is not a rounding error. It is not a temporary anomaly. It is a structural shift — and most organizations are only beginning to understand what it means for their infrastructure, their security, and their bottom line.

The engine driving this transformation is, predictably, artificial intelligence.


bots have taken over the internet 1

The Most Concrete Example First

Before the data, the case study — because abstract percentages are easy to dismiss, and what happened to one of the most visited websites on earth is not.

The Wikimedia Foundation offered the most vivid illustration available of what bot traffic actually does to real infrastructure. AI bots strained Wikimedia's systems so severely that bandwidth for downloading multimedia content surged 50% since January 2024. The breaking point arrived in December 2024, when former U.S. President Jimmy Carter died. His Wikipedia page drew millions of human visitors. Simultaneously, bots began streaming a 1.5-hour video of a 1980 debate from Wikimedia Commons. The surge doubled Wikimedia's normal network traffic, temporarily maxing out several of its internet connections.

Here is the detail that demands attention: the crisis was not caused by the spike. It was caused by the baseline. Bot scraping had already consumed the available headroom. The Carter event simply exposed the damage that had already been done.

Wikimedia's internal analysis revealed a damning asymmetry: bots made up just 35% of total pageviews, yet accounted for 65% of the most expensive requests to core infrastructure. This is a structural feature of how AI training scrapers specifically tend to behave — maximizing coverage across content archives rather than revisiting popular pages. Unlike human visitors, who gravitate toward popular, frequently cached content, AI training bots crawl broadly, accessing obscure pages, bulk-downloading media, and hitting origin servers in ways that caching layers were not designed to absorb. It bears noting that not all bots behave this way: some crawlers are targeted, cache-friendly, and operationally benign. The Wikimedia case reflects the behavior of AI training scrapers in particular.

The result, for organizations running cloud-based infrastructure, is what security professionals have taken to calling a "denial-of-wallet" scenario: costs rise, servers strain, and none of it generates a single dollar of business value.


AI Didn't Just Change the Web. It Weaponized It.

Large Language Models and generative AI tools have done something that no previous technology managed so efficiently: they democratized the creation and deployment of bots. You no longer need to be a sophisticated threat actor to launch a bot campaign. You need a subscription and an afternoon.

The consequences are measurable. Akamai's 2025 State of the Internet report documented a 300% year-over-year surge in AI-driven bot traffic across its global platform — a platform that supports over one-third of global web traffic, giving the sample considerable weight. The publishing sector absorbed the sharpest blow, with 63% of all AI bot activity targeting media properties. Commerce fared no better: over 25 billion bot requests were logged in a mere two-month span.

The Imperva report attributes a significant share of AI-enabled attacks to specific declared user-agent strings: ByteSpider Bot alone was responsible for 54% of all AI-enabled attacks detected across Imperva's network, with AppleBot (26%), ClaudeBot (13%), and ChatGPT User Bot (6%) rounding out the leading contributors. A critical caveat accompanies these figures: bots can and routinely do misrepresent their identity by spoofing user-agent strings. These attributions are based on declared agent names and Imperva's traffic classification heuristics, not verified identity. The actual distribution of bot origin may differ materially from what agent strings report. These numbers should be read as a map of declared behavior, not a census of actual actors.

The pace of escalation is perhaps most starkly illustrated by data from TollBit, a company that tracks web-scraping activity across its customer base. In the first quarter of 2025, one in every 200 website visits came from an AI scraping bot. By the fourth quarter of 2025, that ratio had collapsed to one in every 31. That is not growth. That is an invasion.


The Attack Surface Is Expanding in Every Direction

Beyond raw infrastructure costs, the bot crisis is reshaping the threat landscape in ways that traditional security frameworks were not built to handle.

The Radware 2026 Global Threat Analysis Report — based on data from Radware's cloud and managed security services throughout 2025 — catalogued the full scope of the escalation: network-layer DDoS attacks jumped 168% year over year, web DDoS attacks climbed 101.4%, and bad bot activity rose 91.8%. Most high-impact web DDoS attacks now last under 60 seconds — a deliberate design choice, since attacks that brief render manual mitigation effectively useless.

Meanwhile, 44% of advanced malicious bot traffic is now targeting APIs directly, exploiting the business logic embedded in the workflows that power modern applications. Financial services, healthcare, and e-commerce are bearing the brunt of this. These sectors rely on APIs for payment processing, sensitive data transactions, and customer-facing services — which makes them, in the cold arithmetic of attackers, the most valuable targets. 1

The sophistication of evasion tactics has also crossed a threshold that should concern anyone responsible for network security. According to TollBit's analysis of its customer base, more than 13% of AI bot requests — meaning 13% of requests TollBit classified as coming from AI scraping bots, not 13% of total web traffic — were bypassing robots.txt directives entirely by the fourth quarter of 2025. That share represented a 400% increase from the second quarter to the fourth quarter of 2025, a six-month window. Given that TollBit estimated one in 31 website visits was from an AI scraping bot by Q4, 13% of that volume constitutes a material and growing fraction of total traffic. 2

Bots are also disguising themselves as standard browsers, rotating through residential IP addresses, and mimicking human interaction patterns with sufficient fidelity that the behavior of some AI agents has become, according to TollBit, nearly indistinguishable from legitimate human traffic.


The Analytics Are Lying to You

There is one dimension of this problem that receives insufficient attention: the corruption of data.

When bot traffic floods a platform, it does not simply consume resources. It contaminates the measurement systems that organizations use to make decisions. One documented example: when a ChatGPT session conducts a Google search to fulfill a user prompt, parts of that prompt string can be recorded by Google as a search query — causing ChatGPT-generated phrases to appear in Google Search Console keyword data and distorting keyword analysis. To be precise about the mechanism: this occurs when the AI platform issues a live Google search as part of its retrieval process, and Google logs the query string; it is not a case of literal prompt text being injected into analytics, but rather AI-intermediated search behavior showing up as organic query data. 3

Google's removal of the &num=100 search parameter — a direct response to scrapers bulk-harvesting top search results — caused a measurable drop in impression metrics, revealing just how much of what looked like organic traffic was never human at all.

Security teams face a particularly insidious version of this problem: high-volume bot traffic normalizes anomalous request patterns, masking the early indicators of genuine attacks within the background noise of routine automation.


The Industry Is Responding — Imperfectly

The market has not been passive. A Bots-As-A-Service ecosystem has emerged, commercializing bot management tools from companies like TollBit, Cloudflare, and Bright Data. Legal action is intensifying: The New York Times and Chicago Tribune have sued Perplexity over AI-powered scraping. Google has modified its platform to limit bulk data extraction. Wikimedia has launched a formal initiative — WE5: Responsible Use of Infrastructure — to establish sustainable access boundaries.

These are reasonable responses. They are also, individually, insufficient.

It is worth being direct about the sources underpinning this piece. Imperva, Akamai, Radware, TollBit, and Cloudflare all have commercial interests in the bot management market. Their methodologies differ significantly: some measure HTTP requests, others pageviews or bytes; some cover their own network footprint, others survey subsets of their customer base. None of these datasets are directly comparable, and none represents a neutral, global census of internet traffic. What they represent is a set of independent measurements, drawn from large and consequential network footprints, pointing consistently in the same direction. That convergence is meaningful. But every figure in this piece should be weighed against who produced it, what their network covers, and what unit of measurement they used.

The Cloudflare CEO has predicted that bot traffic will exceed human traffic by 2027. Given that bots already constitute 51% of all web traffic today — on Imperva's measure — that prediction may prove conservative. What is certain is that organizations treating this as a niche security problem, rather than a foundational business and infrastructure concern, are making a very expensive mistake.

The web was built for humans. It is no longer primarily used by them. Every organization with a digital presence needs to reckon, seriously and urgently, with what that means.


bots have taken over the internet 2

Conclusion

The data is unambiguous. The trajectory is clear. Automated bot traffic has crossed the majority threshold, malicious activity is accelerating, infrastructure costs are rising, and the attack surface is expanding in ways that conventional security postures were not designed to address.

The shift from reactive blocking to deliberate, long-term bot governance is no longer optional — it is, as one recent analysis puts it, a cybersecurity requirement. Organizations that lack long-term visibility into bot behavior — beyond the standard 30-day log retention window — are making policy decisions based on incomplete evidence.

For those wondering where to start, the practical steps are not glamorous but they are concrete:

  1. Audit your server logs. Distinguish between good bots, AI bots, and malicious bots. The taxonomy matters — blocking indiscriminately can harm search visibility.
  2. Establish baseline metrics before making blocking decisions. You cannot measure the impact of a policy change without a pre-change baseline.
  3. Implement rate limiting on origin servers. Especially for endpoints that serve large or bulk content.
  4. Extend log retention beyond 30 days. Trend analysis requires trend data. Most bot management tools default to retention windows that are sufficient for incident response but insufficient for strategic planning.
  5. Treat bot governance as an ongoing discipline, not a one-time configuration. The bot ecosystem evolves. Your policies must evolve with it.

The internet has changed. The question is whether the people responsible for running it will change fast enough to keep up.

Footnotes

  1. AI-Driven Bots Surpass Human Traffic - Bad Bot Report 2025

  2. AI Bots Are Now a Significant Source of Web Traffic

  3. How to Manage the Infrastructure Impact of More AI Bot Traffic