One year later, words to remember our Dear Friend Olivier Sanche

In conversations with Olivier Sanche's wife, she had mentioned a simple request she had made to Apple to plant a tree in memory of Olivier. Her request was not answered, so I offered to contact eBay's VP of Technical Operations, Mazen Rawashdeh. Mazen supported the request and made the executive decision to plant a tree at the data center Olivier designed and built in Salt Lake City.

In addition to the tree eBay planted they created a plaque.

NewImage

"The tree is a slow, enduring force straining to win the sky. - Antoine De Saint-Exupery, The Wisdom of the Sands"

"This tree is dedicated to the memory of Olivier Sanche who passed away on November 26, 2010. Olivier was a colleague, peer, mentor, and friend to many in the eBay family. At his memorial he was remembered as someone who lit up a room whenever he entered in through his big personality and solid foundation. The tree symbolizes that seed of friendship, wisdom, and inspiration that he planted with so many. May it, like his memory, continue to grow."

The above plaque was set  in Aug 2011, and KC Mares even posts in Sept 2011 about the plaque, tree, and conference room named after Olivier.  I have been waiting for eBay to post something official, but I must have missed the notices and I can't find it on the web. When I contacted Olivier's wife to ask what she thought of the plaque, she said no one from eBay had contacted her regarding the memorial efforts.  So, I decided to move on my own and ask a few friends to contribute their words to remember Olivier by.

NewImage

First, Mike Manos's (AOL) words.

Olivier was a man of quiet action.  Benjamin Franklin said “If you would not be forgotten as soon as you are dead, either write things worth reading or do things worth writing.”  There is not a month that goes by where I am reminded of my conversations with Olivier.  His influence is still felt throughout our industry and his passion has left an indelible mark that has changed our industry for the better.  True leadership comes from passion and heart.  Olivier had plenty of both.  It’s a rare thing especially when you compare it to many of the ‘pretenders and self-promoters’ who have come after.  His influence was soft but strong, not in-your-face but analytical and probing.  He led you down a path so you understood his position and hopefully adopted it as your own.   Oliver remains a strong voice amongst us and his legacy will be felt for years to come.   On a personal note I continue to miss and mourn a good friend taken far too early.

Nic Bustamente (Microsoft).

Christian [Belady] and I were just talking about him recently, it doesn't seem like a year's gone by. We sat in stunned silence thinking about Olivier . . . . it's still shocking.

Joe Kava (Google)

 

it's hard to believe that more than a year has gone by since Olivier's passing.  I think the fact that many industry notables still speak so fondly of him is a tribute to what a great person he was.  Genuinely one of the "good guys" who really cared and poured his heart into what he was passionate about... his family, his work, the environment, etc.
I continue to miss him.

 

 

Anonymous

 

 

Olivier was a friend and mentor in so many ways.  We would argue to the point of people entering the room to see if we were ok, then we would head of to a night of dinner and drinks together.  I really miss his smile and his way of looking at things.  I would like to make sure that his memory is maintained in a manner appropriate to the life and passion he brought to our industry.

 

Vim Kumar (ex-AT&T reporting to Olivier)

Olivier was a natural born leader and never afraid to roll his sleeves up and "get dirty" with the folks on the floor.  Talk about motivation!  To see your director actually doing something about a problem instead of talking about it really gets you going to get the job done.

Charles Kalko (Skype, ex-eBay)

Olivier is best remembered for his passion to for the environment and his belief that we all can make a difference.

 

Data Centers as the Heart and Brain of a company, an outage is like a stroke

Data Centers are becoming more and more important for a companies survival.  The Data Center is define in Wikipedia as.

Data center

From Wikipedia, the free encyclopedia
An operation engineer overseeing a Network Operations Control Room of a data center.

data center (or data centre or datacentre or datacenter) is a facility used to house computer systems and associated components, such as telecommunications and storage systems. It generally includes redundant or backup power supplies, redundant data communications connections, environmental controls (e.g., air conditioning, fire suppression) and security devices.

In the past I have used the analogy of an information factory to explain how critical it is to operate data centers, but this definition is still too geeky for most.  As companies have learned during extended outages, they are paralyzed as if your brain and heart have stopped working.  Maybe a better analogy is to think of an outage as stroke where blood flow to the brain is cut off.

Definition

A stroke is the sudden death of brain cells in a localized area due to inadequate blood flow.

Description

A stroke occurs when blood flow is interrupted to part of the brain. Without blood to supply oxygen and nutrients and to remove waste products, brain cells quickly begin to die. Depending on the region of the brain affected, a stroke may cause paralysis, speech impairment, loss of memory and reasoning ability, coma, or death.
NewImage

I recently talked to a CEO of a start-up and her company suffered 2 1/2 days of down service during AWS major outage on the East Coast.  AWS comp'd her $700 for the downtime, but that's like paying people for the cost of keeping the body running during 2 1/2 days.  The outage was a paralysis for her business.  Lack of response is like being in a coma.  After a day or two you start to think of death.

Sounds scary.  It should a major data center is like a stroke.

Going back to wikipedia's definition of a data center.

IT operations are a crucial aspect of most organizational operations. One of the main concerns is business continuity; companies rely on their information systems to run their operations. If a system becomes unavailable, company operations may be impaired or stopped completely. It is necessary to provide a reliable infrastructure for IT operations, in order to minimize any chance of disruption. Information security is also a concern, and for this reason a data center has to offer a secure environment which minimizes the chances of a security breach. A data center must therefore keep high standards for assuring the integrity and functionality of its hosted computer environment. This is accomplished through redundancy of both fiber optic cables and power, which includes emergency backup power generation.

Does this sound more like a data center is heart and brain of a company?

When you discuss outages, it can change the minder by thinking of a stroke to the business.  You can be hyper paranoid and create SLAs that define high 9's of availability. Or recognize that outages are part of life.  How you cope with them and recover is key.  Early detection and fast response time limits the damage.

NewImage

Thinking about Data Center business in China, consider the lessons of Soccer

There are a bunch of friends I have discussed the challenges of conducting data center business in China.  The Economist has a great article on Why China fails at Football (Soccer).

Why China fails at football

Little red card

The telling reasons why, at least in football, China is unlikely to rule the world in the near future

 

 

  • The Buddha tells the people he can fulfil only one of their wishes. Someone asks: “Could you lower the price of property in China so that people can afford it?” Seeing the Buddha frown in silence, the person makes another wish: “Could you make the Chinese football team qualify for a World Cup?” After a long sigh, the Buddha says: “Let’s talk about property prices.”

What is the challenge of doing business in China?  Understanding how money influences the system.

Qingdao’s owner Du Yunqi was irate—at his team’s utter incompetence. As he would later admit to investigators, he had just lost a bet that there would be a total of four goals scored in the game. His humiliated assistant coach said on national television, “Afterward the boss was angry and scolded me, saying I bungled things and couldn’t even fix a match.”

If you think data centers are private business, you can't see the connections to the government.

All this hints at something rather unique and powerful about the place of football in Chinese society. It is, like all organised sport in China, ultimately the domain of the government;

What outsiders call corruption is simply the way the Chinese system works.

A recent crackdown on football corruption offers little solace; it simply mirrors the pyrrhic campaigns against official corruption elsewhere in China. A mid-level functionary in China’s state security apparatus puts it candidly: “You know all those problems with society that you like to blame on China’s political system? Well it really is like that with football.”

Data Centers are a priority for the Chinese government.  Soccer is as well.

So whatever ails Chinese football, it is not a lack of passion from the country’s leaders. If anything, the opposite may be the problem. China’s Party-controlled, top-down approach to sport has yielded some magnificent results in individual sports, helping China win more Olympic gold medals in Beijing in 2008 than any other country. But this “Soviet model” has proven catastrophically unsuitable for assembling a team of 11 football players, much less a nation of them.

It would be interesting to survey the additional budget required to support keeping "favors."

Investors would contrive to fix games as favours to the local officials who nominally controlled the clubs (these types of matches are called “favour”, “relationship” or “tacit” matches, and are not viewed negatively by many within the game). Gambling syndicates, including the triads, began exerting influence over investors, referees, coaches and players. A spoils system evolved, and everyone took their cuts.

WSJ blogs about the soccer corruption trials.

Bribery:



China’s long-awaited trial into alleged bribery by the former head referee of the country’s soccer association has begun. The body pledged to fight corruption. More here and here. (China Daily, Xinhua, Xinhua, AFP)

A FIFA anti-bribery panel may be further expanded, its head said. Ted Howard isreported to be the new general secretary of the Caribbean, North American and Central American soccer confederation after Chuck Blazer left. More on a meeting by the FIFA Executive Committee in Tokyo is available here,hereherehereherehere and here. (Bloomberg, Inside World Football, Guardian, Daily Telegraph, AP, BBC, MidEast Soccer, Reuters, NZ News)

Zynga's IPO not so hot, we'll see how Zynga's data center build out goes in 2012

With all this bad news on the stock, it will be interesting if there is an affect on Zynga's aggressive data center capacity expansion.

I was going to write this post on Saturday, but waiting one more work day, Zynga is down another 5%.

Zynga stock falls again, down nearly 10 percent from IPO price

Zynga went IPO yesterday, and closed down 5% from its opening.  WSJ reports on the Zynga offering.


Zynga IPO Fizzles as Stock Falls 5%


Zynga Inc. bombed on its first day of trading Friday, closing down 5% in a signal that the appetite for new issues of fast-growing technology companies may be waning.

The San Francisco social-game maker's shares finished trading at $9.50, a day after the company priced its initial public offering at $10 a share. Zynga opened at about $11 a share on the Nasdaq Stock Market, but fell below its IPO price within the first 10 minutes of trading.



Loggly suffers extended outage after AWS reboot shuts down their service

Loggly a cloud service  that provides as one of its services System Monitoring and Alerting.

Systems Monitoring & Alerting

Alerting on log events has never been so easy.  Alert Birds will help you eliminate problems before they start by allowing you to monitor for specific events and errors.  Create a better user experience and improve customer satisfaction through proactive monitoring and troubleshooting. Alert Birds are available to squawk & chirp when things go awry.

But, Loggly has suffered an extended outage that was caused by AWS rebooting 100% of their servers, but that was only half the time down.  The other half was due to not knowing the service was down.

Loggly's Outage for December 19th

Posted 19 Dec, 2011 by Kord Campbell

Sometimes there's just no other way to say  "we're down" than just admitting you screwed up and are down.  We're coming back up now, and in theory by the time this is read, we'll be serving the app again normally.  There will be a good amount of time until we can rebuild the indexes for historic data of our paid customers. This is our largest outage to date, and I'm not at all proud of it.

...

Loggly uses a variety of monitoring mechanisms to ensure our services are healthy.  These include, but are not limited to, extensive monitoring with Nagios, external monitors like Zerigo, and using a slew of our own API calls for monitoring for errors in our logs.  When the mass reboot occurred we failed to alert because a) our monitoring server was rebooted and failed to complete the boot cycle, b) the external monitors were only set to test for pings and established connections to syslog and http (more about that in a moment), and c) the custom API calls using us were no longer running because we were down.

Combined, these failures effectively  prevented us from noticing we were down.  This in of itself is was the cause of at least half our down time, and to me, the most unacceptable part of this whole situation.

The other half of the outage was caused by Loggly not testing for a 100% reboot of all machines.

The Human Element

The other cause to our failures is what some of you on Twitter are calling "a failure to architect for the cloud".  I would refine that a bit to say "a failure to architect for a bunch of guys randomly rebooting 100% of your boxes".  A reboot of all boxes has never been tested at Loggly before.  It's a test we've failed completely as of today.  We've been told by Amazon they actually had to work hard at rebooting a few of our instances, and one scrappy little box actually survived their reboot wrath.

One of the lessons that Loggly learned that some of my SW buddies and I are using in a SW design is to add more than one monitoring solution.

The second step is to ensure more robust external monitoring.  With multiple deployments, this issue becomes less of an issue, but clearly we need more reliable checks than what we rely on with Zerigo or other services.  Sorry, but simple HTTP checks, pings and established connections to a box do not guarantee it's up!