Windows Azure hit with worldwide partial outage

Microsoft has fixed a glitch in its Windows Azure cloud based service that started late on Tuesday but was fixed earlier today. The partial outage affected the management feature in the Compute section but the issue hit all of the regions that Windows Azure serves around the world.

According to PCWorld, the problems began on Tuesday around 10:35 p.m. ET when the Windows Azure service dashboard noted the Compute problems. The issues this week did not actually affect how applications are run on Azure, which is part of the reason why this outage was not widely reported until today.

The service dashboard posted an update on Wednesday that stated, "Manual actions to perform Swap Deployment operations on Cloud Services may error, which will then restrict Service Management functions." Around 8:45 a.m. on Thursday, Microsoft said that full service had been restored for Windows Azure. It's likely that the company will issue some kind of post-mortem on the incident in the near future.

This is the third such large outage for Windows Azure inside of a year. In December 2012, one server cluster in the south central portion of the US went down for a few days. In February, an outage hit all of its server clusters due to an expired HTTPS certificate.

Source: PCWorld | Image via Microsoft

Report a problem with article
Previous Story

Strategy Analytics: 10.2 million Windows Phones shipped in Q3 2013

Next Story

Microsoft and Apple suing Google with a war-chest of Nortel patents

11 Comments

Commenting is disabled on this article.

ccoltmanm said,
Worldwide partial? Hyperbole oxymoron?

Not at all -- it impacted the entire world, but only part of Azure was impacted. For example, if the Neowin forums were inaccessible to anyone in the world, but the front page news was still available, it'd be a partial worldwide outage of Neowin.

Fezmid said,

Not at all -- it impacted the entire world, but only part of Azure was impacted. For example, if the Neowin forums were inaccessible to anyone in the world, but the front page news was still available, it'd be a partial worldwide outage of Neowin.

What I had in mind.

Semantics here, but it was a worldwide partial outage. Using your words, if it was a partial worldwide outage, then parts of the world would see nothing, not even see the front page news and, yes, that would be an outage. I call this a glitch too, where only some people using the service had an issue. Semantics, not intending to nit-pick!

Actually, the SWAP procedure (when you deploy to staging and want to update the production application) was flawed.
This means any production application already working did not have any issues. If you wanted to update, indeed, you had a problem.

We just skipped the update for a day and then it was fine.
Of course the problem was there and shouldn't have happened, but I don't see how you could lose money in this scenario.

Spicoli said,
Maybe if they have to issue credits due the SLA.

Not even because their SLA is mostly about the availability of customer applications, deployments and infrastructure, none of which was directly affected (ie. no customers had downtime).

"The issues this week did not actually affect how applications are run on Azure"

Well, you can't really call this an outage, more of a glitch.

Wow it sucks that it happened, but when you have, the proper failover tools in place to get something as big as Azure in 3days is impressive. Especially worldwide.

I can't even imagine the am out of $$$ they lost, from this.