Windows Azure suffers yet another extended outage [Update: Back online]

In late December, Microsoft's Windows Azure cloud service suffered a partial outage that affected one server cluster in the south central portion of the US for a few days. On Friday, the service got hit with a far more serious outage that is still being dealt with by Microsoft.

The Windows Azure status page reveals that the problems began around 3:44 pm Eastern time and were due to an expired HTTPS certificate. Unlike the outage in late December, this issue hit all of the Windows Azure server clusters worldwide. It has affected other Microsoft services as well, including Office 365 and the cloud-based save games on Xbox Live.

Microsoft's status page said that HTTP traffic to the servers was not affected by the issue. It added:

We executed repair steps to update the SSL certificate and expect HTTPS traffic to notice gradual recovery in many sub-regions. Further updates will be published to keep you apprised of the situation. We apologize for any inconvenience this causes our customers.

This new outage comes, ironically, a few days after the independent research Nasumi said that Windows Azure was the best cloud service currently available in 2013, over rivals such as Amazon Web Services, Google and others.

UPDATE: The certificate has been renewed, and service has been restored. Here's the official announcement from the Azure page:

On Friday, February 22 at 12:44 PM PST, Storage experienced a worldwide outage impacting HTTPS traffic due to an expired SSL certificate. This did not impact HTTP traffic. We have executed repair steps to update SSL certificate on the impacted clusters and have recovered to over 99% availability across all sub-regions. We will continue monitoring the health of the Storage service and SSL traffic for the next 24 hrs. Customers may experience intermittent failures during this period. We apologize for any inconvenience this causes our customers.

Source: Windows Azure status page | Image via Microsoft

Report a problem with article
Previous Story

Surface Pro availability update, plus download music from TV ad

Next Story

Best Buy offers $100 discount on Windows 8 touch screen PCs

13 Comments

1Pixel said,
How embarrassing. And all this from an expired HTTPS certificate?? Amateurs.
While I agree, considering they probably manage thousands of certs I'm sure having one go expired is more possible than if you're running one website.

It also seems as though it caused tons of clusters to get out of sync which was the time problem. I can't even imagine wading through all the crap they had to just to find out it was a cert.

Their after action report should be interesting.

1Pixel said,
How embarrassing. And all this from an expired HTTPS certificate?? Amateurs.

Embarrising - I agree.
'just from an expired HTTPS certificate' - In business you do NOT take chances especially with comany secrets and whatnot, I'm glad to see their systems fully check the SSL certificate is valid and correct and fail if it isn't, that is how security should be.

MrHumpty said,
While I agree, considering they probably manage thousands of certs I'm sure having one go expired is more possible than if you're running one website.

It also seems as though it caused tons of clusters to get out of sync which was the time problem. I can't even imagine wading through all the crap they had to just to find out it was a cert.

Their after action report should be interesting.


No. They should have this under control. I expect them to have a management system in place that allows them to keep this from happening.


This is seriously embarrassing for them.

You`d think on something as critical as a certificate (which expiring seems to have caused issues) would have some sort of alert system in place to make sure that everything is renewed well before the expiry date.
Even paying someone just to make sure all certs are less than a month from expiry would probably end up cheaper than the down time (at least image wise)!

How this would have been reported if it happened to Apple:

Today Apple reported the reason all iTunes clients were unable to access their content was due to a successful test of their security infrastructure. When asked Tim Cook responded, "We take security seriously and these types of tests are needed to verify our processes work. Yes some times our customers are inconvenienced with these type of tests but I want to assure its in their best interest that we manage our services in this fashion."

The blogs would eat this up!

It's cute that the service was suffering all yesterday afternoon and night. They restored service early this morning and nothing was on Neowin's site. Now, you post a story with an "update" tagline.

How about "Hey, Azure crashed for nearly 12 hours yesterday and we just found out about it so here's the scoop"

Commenting is disabled on this article.