Cloudflare CEO: Outage Stemmed From Software Flaw, Not Cyberattack
‘This was a mistake we caused ourselves,’ Cloudflare CEO Matthew Prince tells CRN. ‘It wasn't an issue caused by someone else.’
Cloudflare CEO Matthew Prince said the company's widespread service outage Tuesday morning was caused by a bug in the company's firewall software and not a cyberattack.
"This was a mistake we caused ourselves," Prince told CRN. "It wasn't an issue caused by someone else."
Cloudflare's firewall software has been designed to scale up and use as much CPU bandwidth as necessary to ensure that customers are always protected from an attack, Prince said. But it appears that a bug in Cloudflare's firewall software made it consume CPU resources across the entire network, Prince said, which in turn starved out other processes.
[Related: 'Major' Cloudflare Outage Briefly Knocks Much Of Internet Offline]
Prince said Cloudflare initially thought it had fallen victim to an unprecedented cyberattack, which was also the source of some speculation on Twitter under the hashtag #CloudflareDown. The company's response time would have been faster than 30 minutes had it not initially thought it was dealing with a large attack, according to Prince.
Cloudflare determined that it hadn't fallen victim to a cyberattack due to a lack of anomalous traffic flows and a lack of anything external coming into the network that would have caused something like this to happen, Prince said. Instead, Prince said Cloudflare is pretty confident that it introduced the software bug itself, and will put procedures in place to stop something like this from happening again.
The company hasn't formally determined what the root cause of the software bug is, but Prince believes the issue most likely stemmed from a change to the firewall software such as a new rule being pushed out to mitigate against attacks. Prince's hunch is that Cloudflare made a change to its firewall software that triggered the bug.
The duration of the outage varied by geography, Prince said, with it lasting for as long as 30 minutes in regions such as North America where it occurred during regular business hours. In regions such as Asia where the outage occurred during overnight hours, Prince said service returned to normal more quickly as a result of there being less internet traffic.
The first signs of trouble came early Tuesday when Cloudflare's London-based Network Operations Center (SOC) witnessed a spike in the CPU being used by the company's firewall software, according to Prince. Cloudflare plans to put safeguards in place to protect against something like this happening again, Prince said.