Cloudflare outage on November 18, 2025
Introduction
On 18 November 2025 at 11:20 UTC, Cloudflare experienced a major global outage that disrupted access to countless websites and services dependent on its network. Users across the world saw error pages indicating failures within Cloudflare’s core network. Although initial suspicion pointed toward a possible large-scale cyber attack, Cloudflare later confirmed that the disruption originated from an internal configuration issue not from malicious activity. This blog explains what caused the outage, how Cloudflare responded, and what steps the company plans to take to prevent such incidents in the future.
What Triggered the Outage
The root cause of the outage was a permissions change applied to one of Cloudflare’s ClickHouse database systems. This change caused the database to output multiple duplicate entries into a feature file used by Cloudflare’s Bot Management system. As a result, the feature file unexpectedly doubled in size.
This oversized file was then automatically propagated across all machines in Cloudflare’s network. Since the core proxy software (FL and FL2) that routes traffic relies on this feature file, and since it had a strict memory limit for file size, the oversized file caused the system to crash. This led to widespread HTTP 5xx errors and traffic failures across Cloudflare’s global network.
Why Cloudflare Initially Suspected a DDoS Attack
The outage unfolded with unusual behavior. The feature file was generated every five minutes. Due to the partial rollout of database permission changes, some ClickHouse nodes produced correct files while others produced bad ones. As a result, Cloudflare's systems would repeatedly fail and then recover, creating irregular traffic spikes that resembled the signature of a large-scale DDoS attack. At the same time, Cloudflare’s independent status page also became unreachable purely coincidentally further misleading engineers during the early investigation.
How the Issue Was Identified
By 13:05 UTC, the team bypassed Workers KV and Access through an older proxy path, reducing some of the immediate impact. By 14:24, Cloudflare identified Bot Management’s configuration file as the true source of failure and halted the generation and propagation of new feature files. An earlier known-good version of the file was deployed globally. By 14:30, core traffic began flowing normally again.
Impacted Products and Services
The outage affected multiple Cloudflare services:
Core CDN and security services
Users received HTTP 5xx status codes due to proxy failure.
Turnstile
Failed to load, preventing authentication on websites and the Cloudflare dashboard.
Workers KV
Returned high levels of 5xx errors as the core proxy was failing.
Dashboard
Most users couldn't log in because Turnstile was down, although the dashboard itself remained online.
Email Security
Temporary loss of access to an IP reputation source reduced spam detection accuracy.
Access
Most users experienced authentication failures, although existing active sessions remained operational.
Additionally, Cloudflare’s CDN experienced increased latency due to CPU load from debugging and observability systems attempting to analyze the errors in real time.
How Cloudflare’s Request Flow Was Affected
Every request to Cloudflare passes through several layers HTTP/TLS, the core proxy (FL), and Pingora where various security modules run. One of these modules, Bot Management, relies on a feature configuration file updated every few minutes to maintain responsive protection against bot-based threats.
The permissions update in the ClickHouse database caused duplicate rows to appear in the feature file. This increased the file size beyond the 200-feature limit of the Bot Management module. When the module encountered the oversized file, it triggered a system panic that caused the proxy to fail and return 5xx errors.
Differences in Impact: FL vs FL2
Cloudflare is currently migrating customers to its newer proxy system, FL2.
Customers on FL2
Saw 5xx errors for most bot-protected traffic.
Customers on FL
Bot scores failed and defaulted to zero, causing false positives for bot-blocking rules but no 5xx errors.
Major Timeline of Events
11:05 – A ClickHouse database access control change was deployed.
11:28 – First customer errors observed as faulty feature files propagated.
11:32–13:05 – Teams investigated Workers KV errors believing the issue originated there.
13:05 – Workers KV and Access were routed through a previous proxy version, reducing impact.
13:37 – Teams focused on reverting the Bot Management feature file.
14:24 – Bad feature file generation stopped; tests on old file succeeded.
14:30 – Main impact resolved; most services recovered.
17:06 – All downstream services restarted and fully restored.
What Cloudflare Is Doing to Prevent Future Outages
Cloudflare committed to several improvements to strengthen resilience:
• Hardening internal configuration ingestion similar to user-generated inputs
• Introducing more global kill switches for critical system behavior
• Preventing debugging systems from overloading system resources
• Reviewing failure modes across all proxy modules
• Building more resilient feature file distribution and validation processes
Conclusion
The outage on November 18 was Cloudflare’s most severe network disruption since 2019, affecting a major portion of the global Internet. Although caused by an internal configuration oversight rather than an attack, the outage prevented traffic from flowing normally for several hours. Cloudflare has apologized to customers and the broader Internet community, acknowledging the seriousness of the failure. The company is now taking steps to reinforce its systems to avoid any recurrence.
For businesses relying heavily on Cloudflare for uptime, this outage serves as a reminder of the importance of redundancy, monitoring, and fallback mechanisms even when depending on the most trusted infrastructure providers.
