If you had trouble streaming video or ordering a pizza online over the weekend, there was a very good reason why. During Sydney's wild weather on Saturday and Sunday, a power issue caused an outage at cloud provider Amazon Web Services, bringing several big websites and streaming services — along with their associated apps — screeching to a halt.
Our external vendor AWS confirmed Stan connection issues due to power outage within AWS. AWS & Stan are working to restore services ASAP.
— Stan. (@StanAustralia) June 5, 2016
— FOXTEL Help (@FOXTEL_Help) June 5, 2016
the @Dominos_AU website is down which means their app is also down so i'm going to starve to death RIP in peace me
— ShrekyWitTheGoodHair (@dylanjakemorris) June 5, 2016
iTnews first reported an issue appearing around 3:30PM AEST on Saturday, with a power issue affecting the company's Elastic Compute cloud product. One of the three AP-SOUTHEAST-2 availability zones within Sydney reported "increased API error rates", and services were progressively switched to the other zones successfully.
AWS then restored power to the offline zone around two hours later, and by 9:30PM that evening the company reported the majority of customer services had been restored. A small number of instances remained offline after the outage due to physical server hardware being "adversely affected" by the loss of power, but the major end user-facing services — including Foxtel, Domain, Domino's and Nine Entertainment's streaming service Stan — were up and running throughout Sunday.
On June 4th at 10:25 PM PDT a significant number of EC2 instances and EBS volumes within a single Availability Zone in the AP-SOUTHEAST-2 Region experienced a loss of power. Beginning at this same time, EC2 API calls in the AP-SOUTHEAST-2 Region experienced increased error rates and latencies as well as delays in propagation of instance state data in the affected Availability Zone. Instances and volumes in the other Availability Zones in the AP-SOUTHEAST-2 Region were unaffected. At 11:46 PM PDT, power was restored to the facility and instances and volumes started to recover. At 1:00 AM PDT, 80% of the affected instances and volumes had been recovered by our automated systems. At 2:45 AM PDT the increased error rates and latencies for the EC2 APIs and the delayed propagation of instance state data were fully resolved. A couple of unexpected issues prevented our automated systems from recovering the remaining instances and volumes. The team was able to fix these issues, and by 8:00 AM PDT, all but a small number of instances and volumes were recovered. Since 8:00 AM PDT our teams have been working to recover these remaining instances and volumes. Most of these instances are hosted on hardware which was adversely affected by the loss of power. While we will continue to work to recover any affected instances or volumes, we recommend replacing any remaining affected instances or volumes if possible.
Unsurprisingly, it looks like a few Grumpy People On The Internet complained to Domino's about not being able to use online discount codes over the phone, and some complained to Stan about wanting compensation for the hours-long outage. That's a little bit much, guys — not even Domino's has power over the weather. [Amazon]