As a major storm crushed the Virginia and Washington, DC area last night and disrupted power to over two million customers, the thing many millions more around the world noticed was that they could no longer share pictures on Instagram, could no longer post interesting tidbits on Pinterest, and couldn't stream a Netflix movie Friday night after dinner. These three sites, and many more, use Amazon's "elastic cloud."
The service is supposed to have so much redundancy built in that a failure in one data center simply routes traffic to another data center. Unfortunately, just like last April, this outage was noticed worldwide. Even users in Europe, who have their own Amazon hubs nearby, were impacted by the outage in Amazon's north Virginia data center power outage. This is significant because Amazon's stated service level agreement (SLA) for EC2 is 99.95% - which means they promise only four and a half hours of downtime a year. This outage alone has reduced the uptime to less than 99.95, and that's not taking into account the outage Amazon suffered in the same area just two weeks ago.
With everyone clamoring to get into the "always-on" public cloud, incidents like this have to make people take a step back and think about the ramifications. The allure of the cloud is that your data is everywhere for everyone, but as history has shown, things aren't always as smooth as companies would have us believe. Incidents like this can have major financial impact, especially as more and more companies are moving their services into the public cloud arena.