Cloudflare had trouble again.
The cloud computing company suffered a global outage early Wednesday, bringing down thousands of major websites that rely on Cloudflare’s networks for performance and security.
As it happens, one of the downed websites was DownDetector, the go-to site millions of folks use to see if websites are having performance issues. Internet outages are best when they’re a little bit ironic.
This was the scene at DownDetector, Cloudflare, and all the sites relying on their network:
In a phone call with Gizmodo, Cloudflare CEO Matthew Prince explained that a software bug in the company’s firewall software caused a massive spike in CPU usage that resulted in global system failures beginning early morning Pacific Time and lasting for about 30 minutes for all of the company’s services.
Europe and the American east coast were most heavily impacted because the outage came in the middle of the workday in those areas.
“We’ve never seen an outage like this before,” Price said. The company investigated whether an attack was responsible for the outage but found instead that it was a software bug.
Cloudflare’s London network operations center was on duty when the incident began. The first problem they noticed was the significant spike in CPU usage due to the firewall service.
They immediately thought it could be an attack, Prince said, because the firewall is designed to scale up instantly to mitigate any attack. After an analysis to look for evidence of attack, Cloudflare found no such evidence or traffic. Prince said his team is “confident this wasn’t an attack.”
“One of Cloudflare’s core policies is radical transparency,” Prince continued. “Today’s outage was 100 per cent in our control and 100 per cent our responsibility.
We’re reaching out to all our customers to honour our responsibilities to them. It’s important for people too know it’s a mistake on our part. While it would be convienent for this to be a nation state or another attacker, this one was our fault.”
This latest problem at Cloudflare comes just one week after a different round of global outages impacted the company’s network. That incident took down a host of popular websites and apps including the chat service Discord. Cloudflare pinned the blame on network issues traced back to an internet infrastructure problem at Verizon.
You can keep track of Cloudflare’s status here.
Update: At 2:30am AEST, we updated this article to include a phone call with Cloudflare CEO Matthew Prince in which he explained that the cause of the outage was a software bug in Cloudflare’s Firewall service.
The company’s investigators are continuing their analysis. They are “confident” the problem was not the result of an attack, Price said.