And, Yahoo Mail story continues, Mail continues to be recovered

I wrote about how the Yahoo Mail problem seems like it could be like the Danger T-Mobile outage.  And a week later it does look more like it has amazing similarities.

Why am I continuing to follow this when almost all the rest of the media has dropped the story?  Because, I think the root cause is operation issues which is interesting to those who run mission critical services.  Think about it a storage system went out that affected 1% of the users.  Yahoo immediately restores from back-up bringing down mail for many users.  Somehow it would seem like if 1% were truly only affected, there could have been a better way to restore Mail Service.

Ironically the impact was much more than 1% loss mail.  Millions of their users had no mail for days.

Here is a replay of the events from their log.  On this thread you can see the history.

A description of the problem on Dec 9.

So, what happened?

On Monday, December 9th at 10:27 p.m. PT, our network operating center alerted the Mail engineering team to a specific hardware outage in one of our storage systems serving 1% of our users. The Mail team immediately started working with the storage engineers to restore access and move to our back-up systems, estimating that full recovery would be complete by 1:30 p.m. PT on Tuesday.

Yahoo Mail said it was up and running with updates from Marissa Mayer and the operations team that 100% restored was successful.

Update 12/14/13 10:40 am PST

Here are this morning’s updates:

  • Account Access: 99.9% of affected users may access their accounts
  • Outage Message Queue: 100% cleared
  • IMAP access: 100% restored

We're making progress on restoring full access to messages for affected customers and will update again with more information. 

+ Update 12/13/13 5:00 pm PST

We have posted an update on the Yahoo blog here:http://yahoo.tumblr.com/post/69929616860/an-update-on-yahoo-mail

Users were still complaining.  Two days later there is another update that explains the problem getting to mail.  So, even though the queues were cleared for mail from Dec 9, the older mail was not restored.

+ Update 12/16/13 9:00 pm PST

We’ve restored access for users and continue to make progress on recovering email messages, folders and inboxes for those users who are still missing messages in their inbox.

As the engineering team continues the restoration process, we wanted to give a couple answers to the top questions we’re seeing:
 

Q:  “I’m missing emails in my inbox from certain dates, but can see everything else.”
A:  There are three periods of time at question when it comes to message restoration. Message restoration for each period can follow a different timeline.

  • Emails from Dec. 9 - now: 100% of emails during this time period have been delivered
  • Emails from Nov. 25 - Dec 9, 2013: 75% of emails from this period have been restored
  • Emails prior to Nov. 25: 90% of emails from this period have been restored

After Dec 9 you have 100% of your mail.  Before that you have between 75% and 90%.  Somehow users don’t think that is mail restored.

And now Yahoo Mail users and Yahoo Customer support is in support hell.

Update 12/18/13 12:30 pm PST

Here’s the latest update from us answering some of your questions:


Q:  I’m on hold for a while when I call Customer Care.  What’s happening?
We’ve heard that some users are experiencing longer wait times than usual. We appreciate your patience while we work through a large volume of calls.  We are adding agents quickly to support this large volume of calls. Alternatively, you can click the link to the right here that says “Contact Customer Care.”  We’ll ask you to provide us with a few more details and then will follow up with you.  


Thank you for your patience.

Q:  I still can't access my account, what can I do?
We believe we've restored access for all users related to the outage. If you're having trouble accessing your account, please reach out to customer service so that we can provide you with 1:1 support.

Previous updates

+ Update 12/17/13 2:45 pm PST

We continue to work on recovering email messages, folders and inboxes for users who are still not seeing some messages in their inbox. In the last 24 hours, we've seen an accelerated rate of message recovery for affected users. Additionally, we are reaching out directly to the impacted users with an update specifically related to their accounts. 
 
We believe that we have restored access for all affected users, but if you are still having trouble accessing your account for any reason, please contact Customer Care at 1-800-318-0612.