Time can support credibility, Google vs. Microsoft outage reports

One of my friends has made the switch from Google to Microsoft.  Well actually I have many friends who have made the switch.  There are also many who have Google to go to Microsoft.  One friend who made who knows how Microsoft works and Google made the point on how the outage reporting posts from the companies differ. 

Microsoft had an Outlook outage with this post on the event.

On Monday and Tuesday of this week, some of our Office 365 customers hosted in our North America datacenters experienced unrelated service issues with our Lync Online and Exchange Online services. First, I want to apologize on behalf of the Office 365 team for the impact and inconvenience this has caused. Email and real-time communications are critical to your business, and my team and I fully recognize our accountability and responsibility as your partner and service provider.

Google reported on one of its outages with this.

Earlier today, most Google users who use logged-in services like Gmail, Google+, Calendar and Documents found they were unable to access those services for approximately 25 minutes. For about 10 percent of users, the problem persisted for as much as 30 minutes longer. Whether the effect was brief or lasted the better part of an hour, please accept our apologies—we strive to make all of Google’s services available and fast for you, all the time, and we missed the mark today.

One way to look at the contrast is Google is specific with the time of 25 minutes, 30 minutes longer.

Microsoft says they have full understanding of the issues, but doesn’t provide the specifics on time.

We have a full understanding of the issues, and the root causes of both the Exchange Online and Lync Online services have already been fixed.

Google had another outage where specifics are reported down to the minute.

Issue Summary

From 6:26 PM to 7:58 PM PT, requests to most Google APIs resulted in 500 error response messages. Google applications that rely on these APIs also returned errors or had reduced functionality. At its peak, the issue affected 100% of traffic to this API infrastructure. Users could continue to access certain APIs that run on separate infrastructures. The root cause of this outage was an invalid configuration change that exposed a bug in a widely used internal library.

Timeline (all times Pacific Time)

  • 6:19 PM: Configuration push begins
  • 6:26 PM: Outage begins
  • 6:26 PM: Pagers alerted teams
  • 6:54 PM: Failed configuration change rollback
  • 7:15 PM: Successful configuration change rollback
  • 7:19 PM: Server restarts begin
  • 7:58 PM: 100% of traffic back online

Outages are painful for all companies.  

Suggestion for when you report your own outage if you include the time of events, then your communication can be viewed as more credible.  Using terms like “some” or “brief” doesn’t work when you are the one who is affected by the outage and brief would mean a minute of outage.