Amazon Web Services knocked offline; Observers say cloud outage raises questions

July 2, 2012 Off By David
Grazed from FierceCommunications.  Author:  Chris Rizo.

A quick-moving catastrophic storm late Friday night knocked part of Amazon Web Services’ (AWS) data center temporarily offline, and with the crash down came the websites of some of the marquee customers of Amazon.com’s (Nasdaq: AMZN) cloud-computing unit.

Downed was AWS’s vaunted Elastic Compute Cloud (EC2) service, which remotely hosts the public-facing websites of movie-streamer Netflix (Nasdaq: NFLX), cloud platform-as-a-service Heroku, photo-sharing service Instagram, and the social-networking site Pinterest, among other online services that similarly rest on Amazon’s digital infrastructure.

The content-delivery failures–blamed on a two-hour massive electrical storm–affected one AWS availability zone, the US-East-1 Region, which resides at Amazon’s data center in northern Virginia…

IT industry observers, in the day since the AWS outage, have raised questions about AWS, particularly its market position.

 

That other cloud services hosted in the same area remained unscathed by the storm raises questions about whether Amazon is "suffering architectural glitches that go beyond acts of God," GigaOM‘s Barb Darrow wrote this weekend.

"The fact that Amazon, like any other data center-dependent business is not bulletproof also raises questions about why its customers don’t pursue a multi-cloud strategy or, if they’re going to rely solely on Amazon, why they put so much of their workload in one geography–a practice Amazon itself advises against," Darrow wrote. "Of course, it isn’t good practice for any vendor to blame snafus on its customers."

To PolicyMic editor Alex Marin, the AWS crash begs the question: "Isn’t Facebook supposed to host or at least provide a more secure server for the company that it spent so much money in?"

In a column published Saturday afternoon, Marin wrote: "The episode highlights an unsettling trend in a tech sector that is becoming increasingly divided, with titans such as Facebook and Amazon–the tech equivalents of the "too big to fail" banks–on one side; and a string of innovative but smaller tech start ups (Instagram, Pinterest) that are trying to grow in a highly competitive tech sector on the other." 

EverythingPR‘s Tavis Hampton had this take, from a public-relations angle: "The days of housing their own web infrastructure are over for many web companies, and that means the likelihood of larger chunks of the web going offline simultaneously could increase."

Seattle-based Amazon said that Friday night’s storm took out backup generators after disrupting the data center’s primary power service. Amid a brutal heat-wave, the Mid-Atlantic derecho–packing violent winds that reached 80 mph in the region–left millions of people across the eastern United States without power.

According to the AWS Service Health Dashboard, the EC2 cloud began experiencing connectivity issues at 8:21 p.m. PT. By 8:40 p.m., Amazon reported "a large number of instances in a single availability zone" had lost power.

At roughly 11:10 p.m. EST, the sites at Amazon’s US-East-1 data center in Ashburn, Va., crashed. Netflix was offline until about 1:15 a.m.  Pinterest remained down until 1:50 a.m. As of Saturday afternoon, Heroku reported problems with some databases.

By late Saturday morning, Amazon said some of its hosted sites were back online. "We are continuing our recovery efforts for the remaining EC2 instances," the company wrote on its status blog. At 4:43 p.m., Amazon said EC2 "is now operating normally. We will post back here with an update once we have details on the root cause analysis."

Companies affected by the AWS failure used Twitter to keep their consumers updated.

Los Gatos, Calif.-based Netflix wrote Friday at 8:50 p.m. to its more than 23,000 @Netflixhelps followers: "We’re aware that some members are experiencing issues streaming movies and TV shows. We’re working to resolve the problem."

Instagram, owned by Facebook (Nasdaq: FB), on Saturday tweeted to its over 6 million @Instagram followers: "Due to severe electrical storms, our host had a power outage, no data is lost – we’ve been working through the night to restore service."

This was not AWS’s first outage. Friday’s incident came just two weeks after Amazon had a six-hour outage related to its EC2 service and Relational Database Service (RDS). In April 2011, over 100 sites–including social-networking sites Foursquare and Hootsuite, and social-news site Reddit –were affected by a four-day software-related failure.