Microsoft explains why Office 365 experienced a significant disruption this week

Microsoft's Office 365 service experienced a rather lengthy outage earlier this week. On Monday and Tuesday, users in North America were faced with Lync and Exchange outages that lasted for several hours. 

Microsoft said that the Lync and Exchange outages were unrelated, but another breakdown in the Service Health Dashboard meant that those who were affected were not being notified of the outage. It was a double hit for Microsoft: not only were core features offline, but the mechanism to alert users of the outage was failing as well.

Lync Online's drop off was caused by a brief loss of connectivity. When connectivity was restored, the backlog of traffic caused a significant spike in traffic and overloaded the remaining servers, which disrupted the service for some customers.

The Exchange issue was the result of a failure in a directory that caused a directory partition to stop responding to authentication requests. Microsoft said that this was a unique failure and that was the reason for the extended downtime with that platform. 

As you would expect, Microsoft said that the issues have been fixed and that they have learned from this experience on how to avoid such scenarios again. While Office 365 has been stable (for the most part), the platform has historically had no issues with downtime of this length in the past.

Source: Microsoft

Report a problem with article
Previous Story

Developer claims Microsoft stole his idea and now earns billions from it

Next Story

Xbox Live July Games with Gold: 'Guacamelee!', 'BattleBlock Theater', and more

21 Comments

Commenting is disabled on this article.

duddit2 said,
Its funny seeing comments that assume local servers never experience outages. Funny.

1) UPS.

Let's say that i have a local server with a 0.1% downtime x year (about 8 hours).

2) Redundant server.
However, if i add a redundant server (also 0.1% downtime) then my final downtime (for both servers at the same time) is:

0.1%*0.1% = 0.01% or about 52 minutes x year.

Microsoft promises a SLA of 99.9%. However, in reality they promised a 99% our about 3.6 days x year down, lower that, MS pays half of the cost. However, if the service is out for less than 95% then MS refunds all the money. So, the real SLA is 95% or two weeks x year, pretty meh.

Also, MS does not include completely "degradation of service" as part of the SLA.

Office 365 is cool but not yet for some business.

Edited by Brony, Jun 28 2014, 1:05pm :

Brony said,

1) UPS.

Let's say that i have a local server with a 0.1% downtime x year (about 8 hours).

2) Redundant server.
However, if i add a redundant server (also 0.1% downtime) then my final downtime (for both servers at the same time) is:

0.1%*0.1% = 0.01% or about 52 minutes x year.

Microsoft promises a SLA of 99.9%. However, in reality they promised a 99% our about 3.6 days x year down, lower that, MS pays half of the cost. However, if the service is out for less than 95% then MS refunds all the money. So, the real SLA is 95% or two weeks x year, pretty meh.

Also, MS does not include completely "degradation of service" as part of the SLA.

Office 365 is cool but not yet for some business.

Oh not saying you cant mitigate against things like this locally of course you can. A lot of my customers that have migrated though wouldn't have spent the required fees to setup a fully redundant system and as such 365 offers far better value and stability.

There is also the benefit of the location of 365 servers relative to the large pipes. For local exchange its all coming down your wan regardless of whether its going to be rejected or not and unless you have a dedicated wan for email traffic you've lost an element of control over your bandwidth. I know you could use an offsite smtp relay performing scanning and only allowing legitimate traffic down to your local servers but you still have to deal with bursts of large emails (attachments) slowing things down, for smaller companies its a problem that 365 solves.

Pros and cons each way really.

Brony said,

1) UPS.

Let's say that i have a local server with a 0.1% downtime x year (about 8 hours).

2) Redundant server.
However, if i add a redundant server (also 0.1% downtime) then my final downtime (for both servers at the same time) is:

0.1%*0.1% = 0.01% or about 52 minutes x year.

Microsoft promises a SLA of 99.9%. However, in reality they promised a 99% our about 3.6 days x year down, lower that, MS pays half of the cost. However, if the service is out for less than 95% then MS refunds all the money. So, the real SLA is 95% or two weeks x year, pretty meh.

Also, MS does not include completely "degradation of service" as part of the SLA.

Office 365 is cool but not yet for some business.

I never knew that MS only guarantees 3 '9's of uptime. Most companies that provide this kind of service would guarantee 4 '9's of uptime.

#Michael said,

I never knew that MS only guarantees 3 '9's of uptime. Most companies that provide this kind of service would guarantee 4 '9's of uptime.

Guarantee is a funny word when it comes to SLA's though. Your still down if your down but you get reimbursed for it.

That's always going to be a problem when you put data on the cloud.

No one in their right mind would trust their business to a cloud based service, not only due to outages, but also to security.

dvb2000 said,
That's always going to be a problem when you put data on the cloud.

No one in their right mind would trust their business to a cloud based service, not only due to outages, but also to security.

Apparently, governments and Fortune 500 companies do.

And outages also happens in house as well, and it is still the same concept as if you are locally storing your data to a network drive. Difference between the two - your data is housed by another company, but it could provide lower IT cost.

Our corporation's Exchange is hosted on 365. It was down for almost the whole work day from 6am to 4pm... Not good. We have dubbed it Office 364.

Talk to a tech, often they will refund you something. Not something you'd get of you hosted it yourself. You'd just have to take the loss.

Javik said,
And this is why I prefer hard copies installed to my hard drive over cloud based crapware.

Um dude... Office IS installed to your hard drive. You just showed that you have NO Idea what Office 365 is.

Altima said,

Um dude... Office IS installed to your hard drive. You just showed that you have NO Idea what Office 365 is.

Lync and Exchange, the Office 365 services affected here, are not localized to your machine. Lync is an IM client that requires active connections to a server to actually function. The same is true for Exchange. Sure you can access your mail from yesterday, but a failure such as this will impede your ability to actually respond to or receive actionable incoming email.

Running Exchange and/or Lync on-premises would have allowed the company to miss this downtime and outage... Obviously, it wouldn't affect the locally installed copy of Word, but if you made heavy use of OneDrive, especially for collaboration, a similarly long outage could also impede your ability to be productive.

If you're a business it is very hard to be at the mercy of a vender when it comes to business critical infrastructure. At my work Lync is integral due to most workers working primarily from home and teams being dispersed around the globe. An extended Lync outage would be very disruptive.

Javik said,
And this is why I prefer hard copies installed to my hard drive over cloud based crapware.

Every thread, you spread hate speech without any valid reason and prove that you have no idea what you are talking about. Keep embarrassing yourself.

Oh and...while this service was down, I signed important contracts using Word and sent to the other party without any issue. I have Office 365.

Amen. There is nothing like the security of code and files on your hardware. While not 100% perfect, at least you are not relying on someone/something else.

Lord Method Man said,
Yeah, I'm glad I have a "hard copy" of Lync on my machine so I can magically IM other people without an internet connection.

I demand hard copy for Skype too!

Microsoft should rename the Office 365 services into Office One, make it more consisted with other microsoft product, and that kind of name also implies guarantee that it will be un-disrupted for about One second.

Yes, since their Office 365 name took a hit in the record to uptime.. Now with a new name (if they choose) OfficeOne.. They can again claim 100% uptime :p since the new name would start taking data after this outage..

Cleaver :evil: