Earlier this week numerous Microsoft customers experienced issues thanks to a widespread outage of the company’s Azure services. After apologizing and fixing most of the issues Microsoft has revealed the trouble started thanks to a performance update.
A couple of days ago users in the US, Europe and part of Asia started experiencing major disruptions of Azure services, despite the fact that the service’s Health Dashboard showed everything running smoothly. Even non-Azure users were affected as MSN.com and Xbox Live also partly rely on the company’s cloud systems.
The company’s officials are now explaining that the issues that affected so many systems started due to a botched performance update which was being applied to Azure Storage. The update had been in testing prior to full deployment but the issues never showed up until it was too late.
To make matters worse a second error spread the update much faster than it should have resulting in large swathes of the world being affected. The company explained:
Unfortunately the issue was wide spread, since the update was made across most regions in a short period of time due to operational error, instead of following the standard protocol of applying production changes in incremental batches.
While most issues have now been fixed by rolling back the update and restarting the storage front end, intermittent problems still remain and Microsoft is working with clients to help resolve those. The company is also committing to a series of steps to avoid such problems in the future.