Amazon Says One Engineer’s Simple Mistake Brought The Internet Down

7 years ago

March 3, 2017 at 9:00 am

Amazon Says One Engineer’s Simple Mistake Brought The Internet Down

Roughly 48 hours after its major service outage, Amazon is admitting what caused the problem. Apparently, some poor engineer at Amazon Web Services (AWS) did an oopsie and brought the internet to its knees. Oopsies are the worst!

In all seriousness, it’s a sobering story. Here’s how Amazon described it in a recent blog post:

At 9:37AM PST, an authorised S3 team member using an established playbook executed a command which was intended to remove a small number of servers for one of the S3 subsystems that is used by the S3 billing process. Unfortunately, one of the inputs to the command was entered incorrectly and a larger set of servers was removed than intended.

We’ve all been there. You push the wrong button and end up getting Sprite instead of Coke. But this poor guy or gal probably made an errant keystroke that crippled AWS for at least four hours. Since about a third of all internet traffic reportedly flows through AWS servers, deleting a whole bunch of those servers screwed up a few people’s days.

[referenced url=”https://gizmodo.com.au/2017/03/how-one-little-amazon-error-can-destroy-the-internet/” thumb=”https://gizmodo.com.au/wp-content/uploads/2017/03/amazon2-410×231.jpg” title=”How One Little Amazon Error Can Destroy The Internet” excerpt=”The fact that Amazon controls a vast swath of cloud computing services became dreadfully clear on Wednesday morning when a string of errors brought countless websites to their knees. This consolidation of power is, perhaps suddenly, a very big problem.”]

In theory, a series of failsafes should keep the fallout from such errors localised, but Amazon says that some of the key systems involved hadn’t been fully restarted in many years and “took longer than expected” to come back online.

The company now claims it’s “making several changes as a result of this operational event.” One of these changes will involve modifying a tool so that a large number of servers can’t be deleted at once. Which makes total sense, but still doesn’t solve the problem of unknown unknowns (like, say, a slower than expected restart) on an internet that relies so heavily on a single service.

In the meantime, let this serve as a shoutout to that poor AWS engineer who made a tiny mistake that led to major consequences. We’re having a rough year, too.

We’ve reached out to Amazon to find out more details about the incident, specifically the fate of the poor engineer who caused the problem. We’ll update this post when we hear back.

[Amazon]

Here’s Everything That’s Gone Wrong With the Cybertruck Since It Released

How to Turn Off Those Pesky Start Menu Ads in Windows 11

WTF Is Happening in the Star-Studded Trailer for Blink Twice?

Mazda 6 Reborn As the Rear-Wheel Drive, Electric Ez-6

Here’s What’s Coming to Netflix, Prime Video, Disney+, Stan, Binge, Paramount+, and Shudder in May

Today’s Best Australian Tech Deals

Kogan Is Currently Your Cheapest Option for an NBN 50 Plan

Circles.Life Is Offering $20 for a Whopping 150GB of Data

Grab a Solid Bargain While Samsung’s Portable SSDs Are up to 54% Off

Southern Phone Currently Has the Cheapest NBN 1000 Plan

Amazon Says One Engineer’s Simple Mistake Brought The Internet Down