How Good is the Cloud? Hiring of Netflix VP for CIO job will change Yahoo's DC Strategy

Yahoo has announced it has hired a new CIO, Mike Kail.  Mike was VP of IT Operations at Netflix, the company who helped to create the momentum that the cloud is better than owning data centers.  ZDNet posts on Netflix being the biggest cloud app.

The biggest cloud app of all: Netflix

Summary: The largest pure-cloud play service of all is based on Netflix's open-source stack running on Amazon Web Services.

 
 

Netflix, the popular video-streaming service that takes up a third of all internet traffic during peak traffic hours isn't just the single largest internet traffic service. Netflix, without doubt, is also the largest pure cloud service.

It would seem like Mike Kail would be a person who would push the use of the Cloud at Yahoo.  Here is the press release on the new CIO.

 (YHOO) announced that Mike Kail has joined the company as CIO and SVP, Infrastructure. In this role, Mike will lead Yahoo’s IT and data center operations, reporting to CEO Marissa Mayer.

 

“The strength of our technical infrastructure is critical as we aim to deliver the best possible user and advertiser experiences. It also ensures that Yahoos have the tools and technology necessary to execute. After an intensive search for the right leader, I am excited to announce that today Mike Kail is joining Yahoo as our new CIO and SVP, Infrastructure,” said Yahoo CEO Marissa Mayer. “Mike has the perfect combination of experience and vision to lead our IT and infrastructure to even greater global reach and scale.”

 

"I’m extremely excited to be joining Yahoo and to contribute to their focus on building great sites and services," said Yahoo SVP, Infrastructure and CIO. "I'm looking forward to leading Yahoo’s world-class infrastructure teams, which will continue to provide the web scale architecture that enables the many outstanding products of the company."

Mike has 10 slideshare presentations over the past 11 months with none before that.  It would seem Mike has been getting his name out there as part of leaving Netflix.  Many times a sudden increase in executives giving presentation is related to them looking for a new job.

I was looking at the media’s coverage of Mike Kail’s move and didn’t find anything insightful.  Looking through Mike’s presentations provided more information on what he might do.

For example, here is Mike’s presentation on The future of IT infrastructure, a CIO perspective from May 2014.

The Future of IT InfrastructureThe Future of IT InfrastructurePresentation Transcript

  • The Future of IT Infrastructure The CIO Perspective mike d. kail VP of IT Ops :: Netflix @mdkail
  • IT :: Evolution IT Must Enable the Business Embrace Change
  • IT :: Revolution ● Cloud Adoption by CIOs
  • Financial Shift :: CapEx → OpEx ● Spending Efficiency ○ Over/Under Provisioning ● Cash Flow “smoothing” ● Business Agility ● Legacy → Innovative ● No More Write-Downs
  • IT Trends :: Moonshot Thinking ● IaaS / PaaS / SaaS ● Mobile Everything ● API Ubiquity ● Rebirth of SQL / ETL 2.0 ● Data Security ● Cloud Identity ○ AuthN + AuthZ + MFA ● Rich Applications ○ UI/UX ● Big Data Analytics
  • IT :: Roadmap ● Talent -- A+ Players ● Planning -- 10x goal ● Custom Applications Dev ● Data/Metrics Driven Decisions ● Consumerization Effect ● Security, Security, Security

Slide 4 from above gives you tips Mike like to embrace the cloud.

NewImage

On Mar 2014, Mike preaches more on the benefit of the cloud.

NewImage

NewImage

 

 

And, Yahoo Mail story continues, Mail continues to be recovered

I wrote about how the Yahoo Mail problem seems like it could be like the Danger T-Mobile outage.  And a week later it does look more like it has amazing similarities.

Why am I continuing to follow this when almost all the rest of the media has dropped the story?  Because, I think the root cause is operation issues which is interesting to those who run mission critical services.  Think about it a storage system went out that affected 1% of the users.  Yahoo immediately restores from back-up bringing down mail for many users.  Somehow it would seem like if 1% were truly only affected, there could have been a better way to restore Mail Service.

Ironically the impact was much more than 1% loss mail.  Millions of their users had no mail for days.

Here is a replay of the events from their log.  On this thread you can see the history.

A description of the problem on Dec 9.

So, what happened?

On Monday, December 9th at 10:27 p.m. PT, our network operating center alerted the Mail engineering team to a specific hardware outage in one of our storage systems serving 1% of our users. The Mail team immediately started working with the storage engineers to restore access and move to our back-up systems, estimating that full recovery would be complete by 1:30 p.m. PT on Tuesday.

Yahoo Mail said it was up and running with updates from Marissa Mayer and the operations team that 100% restored was successful.

Update 12/14/13 10:40 am PST

Here are this morning’s updates:

  • Account Access: 99.9% of affected users may access their accounts
  • Outage Message Queue: 100% cleared
  • IMAP access: 100% restored

We're making progress on restoring full access to messages for affected customers and will update again with more information. 

+ Update 12/13/13 5:00 pm PST

We have posted an update on the Yahoo blog here:http://yahoo.tumblr.com/post/69929616860/an-update-on-yahoo-mail

Users were still complaining.  Two days later there is another update that explains the problem getting to mail.  So, even though the queues were cleared for mail from Dec 9, the older mail was not restored.

+ Update 12/16/13 9:00 pm PST

We’ve restored access for users and continue to make progress on recovering email messages, folders and inboxes for those users who are still missing messages in their inbox.

As the engineering team continues the restoration process, we wanted to give a couple answers to the top questions we’re seeing:
 

Q:  “I’m missing emails in my inbox from certain dates, but can see everything else.”
A:  There are three periods of time at question when it comes to message restoration. Message restoration for each period can follow a different timeline.

  • Emails from Dec. 9 - now: 100% of emails during this time period have been delivered
  • Emails from Nov. 25 - Dec 9, 2013: 75% of emails from this period have been restored
  • Emails prior to Nov. 25: 90% of emails from this period have been restored

After Dec 9 you have 100% of your mail.  Before that you have between 75% and 90%.  Somehow users don’t think that is mail restored.

And now Yahoo Mail users and Yahoo Customer support is in support hell.

Update 12/18/13 12:30 pm PST

Here’s the latest update from us answering some of your questions:


Q:  I’m on hold for a while when I call Customer Care.  What’s happening?
We’ve heard that some users are experiencing longer wait times than usual. We appreciate your patience while we work through a large volume of calls.  We are adding agents quickly to support this large volume of calls. Alternatively, you can click the link to the right here that says “Contact Customer Care.”  We’ll ask you to provide us with a few more details and then will follow up with you.  


Thank you for your patience.

Q:  I still can't access my account, what can I do?
We believe we've restored access for all users related to the outage. If you're having trouble accessing your account, please reach out to customer service so that we can provide you with 1:1 support.

Previous updates

+ Update 12/17/13 2:45 pm PST

We continue to work on recovering email messages, folders and inboxes for users who are still not seeing some messages in their inbox. In the last 24 hours, we've seen an accelerated rate of message recovery for affected users. Additionally, we are reaching out directly to the impacted users with an update specifically related to their accounts. 
 
We believe that we have restored access for all affected users, but if you are still having trouble accessing your account for any reason, please contact Customer Care at 1-800-318-0612.

 

Danger, Yahoo Mail is having the T-Mobile Sidekick Experience that sunk the service

If you hang around the hot things in the technology it is easy to believe that email is dead.  I don’t know about you, but e-mail is part of how I communicate.  Many young people have dropped their e-mail accounts as their friends use social media.  Yahoo is finding out how important mail is with days of outages that appear there is no end in sight.

This event has the possibility of being as big a disaster as Microsoft’s Danger T-Mobile sidekick outage/data loss that caused users to drop the service.

October 2009 data loss[edit]

In early October 2009, a server malfunction or technician error at Danger's data centers resulted in the loss of all Sidekick user data. As Sidekicks store users' data on Danger's servers—versus using local storage—users lost contact directories, calendars, photos, and all other media not locally backed up. Local backup could be accomplished through an app ($9.99 USD) which synchronized contacts, calendar, and tasks, but not notes, between the web and a local Windows PC. In an October 10 letter to subscribers, Microsoft expressed its doubt that any data would be recovered.[6]

The customer's data that was lost was being hosted in Microsoft's data centers at the time.[7] Some media reports have suggested that Microsoft hired Hitachi to perform an upgrade to its storage area network(SAN), when something went wrong, resulting in data destruction.[8] Microsoft did not have an active backup of the data and it had to be restored from a month-old copy of the server data, totalling 800GB in size, from offsite backup tapes. The entire restoration of data took over 2 months for customer data and full functionality to be restored.[9]

The Danger/Sidekick episode is one in a series of cloud computing mishaps that have raised questions about the reliability of such offerings.[10]

When you look at what is one of the causes of a major outage you will eventually trace to operations.  The initial Yahoo mail outage was caused be a hardware failure.  Marissa Mayer has posted the latest as of 5p today.

The initial failure was in a storage system.

On Monday, December 9th at 10:27 p.m. PT, our network operating center alerted the Mail engineering team to a specific hardware outage in one of our storage systems serving 1% of our users. The Mail team immediately started working with the storage engineers to restore access and move to our back-up systems, estimating that full recovery would be complete by 1:30 p.m. PT on Tuesday.

So, Yahoo fixes the problem, but restoring service is not simple as users are affected in a wide range.

However, the problem was a particularly rare one, and the resolution for the affected accounts was nuanced since different users were impacted in different ways. Some of the affected users were unable to access their accounts, instead seeing an outdated “scheduled maintenance” page which was a confusing and incorrect message (this has since been corrected and updated). Further, messages sent to those accounts during this time were not delivered, but held in a queue.

Now the service is running unless you use IMAP.  What is IMAP?  It is the way many mail clients mobile and desktop download mail, but it is not as easy as POP.

While IMAP remedies many of the shortcomings of POP, this inherently introduces additional complexity. Much of this complexity (e.g. multiple clients accessing the same mailbox at the same time) is compensated for by server-side workarounds such as Maildir or database backends.

The IMAP specification has been criticised for being insufficiently strict and allowing behaviours that effectively negate its usefulness. For instance, the specification states that each message stored on the server has a "unique id" to allow the clients to identify the messages they have already seen between sessions. However, the specification also allows these UIDs to be invalidated with no restrictions, practically defeating their purpose.[13]

Unless the mail storage and searching algorithms on the server are carefully implemented, a client can potentially consume large amounts of server resources when searching massive mailboxes.

Users don’t care about these details on IMAP.  Marissa closes her status with the following.  Will that make the users who don’t have mail through IMAP feel better?

Above all else, we’re going to be working hard on improvements to prevent issues like this in the future. While our overall uptime is well above 99.9%, even accounting for this incident, we really let you down this week.

We can, and we will, do better in the future.

It’s still not clear what is going to happen to those users email accessible through IMAP.

CIO's view of the Data Center, a 2006 perspective from Lars Rabbe at Yahoo

I wrote a blog post on Lars Rabbe back in Nov. 2011

Data Center Thought Leadership, accumulated by companies or people?

DatacenterKnowledge just posted on the Yahoo Factor in data centers referring to Kevin Timmons, Lars Rabbe, Scott Noteboom, and Tom Furlong.

But, after spending the past 3 days chatting with the current Data Center Thought Leadership who were at 7x24 Exchange, I think we would have all had a good laugh.  Scott Noteboom is not part of this crowd as once you walk into Apple, you disappear from the data center crowd.  Kevin Timmons escaped this situation and is now CTO of Cyrus One and was busy meeting and greeting at 7x24.  Tom Furlong was circulating after his presentation on the Open Compute Project and Facebook's data centers.  Lars Rabbe is busy flying around the world between Estonia, Palo Alto (Skype bldg), and Redmond (Microsoft HQ).

For your public consumption i found this 2006 ZDNet CIO video where Lars discusses how data centers need to be built differently.

In the transcript which has some character set mapping issues (apostrophes) are lots of mentions of data centers.

LARS RABBE:

How people react to the products and what makes a better product in terms of what is it that appeals to people inside the product and how the product interacts with you. On the side of data center innovation we are really working on expanding, let’s say, the processing footprint worldwide. We’re at the point now where the data center industry has really been left behind by the growth of the internet companies and we, along with other companies, are now building our own data centers. And we’re taking the opportunity while we’re building these data centers to really think about “is the conventional data center really what fits our needs�, and it turns out in a lot of areas that it really doesn’t, that we can do things much better if we design our own data center from the ground up. We’d recently broke ground on a data center in the Pacific Northwest and we’re going to be applying a bunch of new technologies there, some of which we’re actually inventing ourselves in terms of how do you put together the data center, how do you take best advantage of the power because that is one of the biggest issues when you are running a data center. The cost of power, so saving power is a big deal.

DAN FARBER:

Of course.

Yeh!!! a CIO that talks power efficient data centers.

LARS RABBE:

And, in general we’ve got to save power. So the ability to make a much more power efficient data center is what will make a big difference.

DAN FARBER:

Now it seems that every company that’s reaching large scale is building new data centers and building them more efficiently in the areas where the cost of electricity is much cheaper. But it seems to me that that’s an opportunity for shared innovation as opposed to each company doing it on its own and inventing its own kinds of innovations to drive those data centers. Do you see that as a possibility?

Here is one of the best comments.

LARS RABBE:

I think there are some competitive advantages in some of this and there are certainly some of these things that we will patent because we consider them to be significantly different. But I also agree that if we come up with ideas that as such will make the industry more power efficient we absolutely will share those and we are using the same contractors also. I’m sure those contractors in turn will leverage those ideas for future construction and future concepts of data centers.

The Green Data Center idea was discussed back in 2006 by Lars.  How is that for Thought Leadership?