Amazon: human error caused Netflix's Christmas Eve outage

On Christmas Eve, as millions of US residents were either at home or traveling to get home before Christmas Day, they found that many of their devices that streamed Netflix were unable to do so. This was due to a problem in the Northern Virginia server cluster operated by Netflix's server partner Amazon Web Services. The issue was not fully addressed until well into Christmas Day.

Late on Monday, Amazon issued a statement that gave the reasons for the outage. Simply put, someone at Amazon Web Services did something that they shouldn't have done. Amazon said that on Christmas Eve, part of the state data that handles its East Coast Elastic Load Balancing system was deleted by an unnamed developer on the company's team.

Amazon's statement said:

The data was deleted by a maintenance process that was inadvertently run against the production ELB state data. This process was run by one of a very small number of developers who have access to this production environment. Unfortunately, the developer did not realize the mistake at the time. After this data was deleted, the ELB control plane began experiencing high latency and error rates for API calls to manage ELB load balancers.

The statement goes into detail on how the deleted data caused about 6.8 percent of the running ELB load balancers to be affected. That was apparently enough to cut off access to Netflix's service for many smartphones and other hardware. Amazon's workers were able to set things right by mid-day on Christmas Day.

Amazon says they will take steps to make sure something like what happened on Christmas Eve does not happen again. That will include requiring a per-incident CM approval before a developer on the team can access the production ELB data. Amazon also gave their apologies for the incident, saying:

We know how critical our services are to our customers’ businesses, and we know this disruption came at an inopportune time for some of our customers. We will do everything we can to learn from this event and use it to drive further improvement in the ELB service.

Source: Amazon Web Services | Image via Amazon Web Services

Report a problem with article
Previous Story

Net Applications: Windows 8 running on 1.64 percent of PCs

Next Story

IE10 still well behind most major web browsers in December

7 Comments - Add comment