Is Building a Data Center on Your Bucket List? some items should be skipped

I was reading this NBCNews article on the foolish things people do to check off their bucket lists.

Bucket lists gone bad: When senior thrills become life threatening

7 hours ago

Laverne Everett's skydiving partner holds onto her after she fell out of her harness.
YouTube
Laverne Everett's skydiving partner holds onto her after she fell out of her harness.

An 80-year-old woman on a tandem skydive slipped from her instructor’s harness then held on for life while rocketing toward Earth. An Alabama man busted his ankles trying to ride a bull. A Missouri man smashed his body – and his new motorcycle – minutes after buying the bike.

And it reminded me of a story I was telling of an IT executive I know who was convinced he needed to build a data center to support his company's move out of colocation spaces scattered around the world.  He was thrilled to build and when i told him he should go the route of three wholesale sites scattered around the US and Europe he said he had on good expensive advice from Gartner that he was doing the right thing.  Three years later, the data center is not operating yet,  he has changed companies. 

I found the public disclosure of the company finally breaking ground on a 10-15MW data center in Dec 2012.  If they had followed my advice, they would probably be on their fifth wholesale deployment by now with 25 MW of capacity and spent a fraction of the capital.  There were all kinds of people telling the executive building a data center is something he should do.  Now that he is at a new company and the strategy is cloud, hopefully the executives will keep him from continue to focus on his bucket list item of building a data center.  A high availability service needs at least 3 and ideally 5 locations.  Why 5?  Because at some point you'll have major maintenance events and going from 5 to 4 is much better than going from 3  to 2.

Revlon and NetApp CIO's discuss Innovating in IT with a change of culture

At GigaOm Structure I moderated a panel discussion with Revlon CIO Dave Giambruno and NetApp CIO Cynthia Stoddard.  Here is a post on the presentation.

How lipstick maker Revlon turned around its business with IT

 

JUN. 19, 2013 - 4:01 PM PDT

No Comments

SUMMARY:

Five years ago, cosmetics giant Revlon’s balance sheet wasn’t looking too good. But after its IT overhaul, the company is one of the most successful in its category, said the company’s CIO.

One are I felt were one of the most important made are.

In addition to changing its infrastructure, Giambruno said technology has helped Revlon shift its culture and develop an environment that’s more receptive to risk-taking.

Cynthia Stoddard, SVP and CIO of NetApp, agreed on the importance of encouraging a culture that supports change and experimentation.

I'll write another post on how I approach panels.  Luckily I only had one at GigaOm, some of the folks have 3 or more.

Do you have the bad habit of trying to be the smartest in school vs. the smartest in the real world

Hitting the road is a time to meet new people and run into old friends.  I left SEA to SJC to go to GigaOm and start the networking.  And, as usual the networking starts as soon as I get to the airport.  I run into one of my old bosses, John Frederiksen who left Microsoft a year ago and is now VP of product management at NetApp. We chat about cloud and data centers.  I had an interest in chatting about NetApp since I am moderating a panel with NetApp's CIO Cyndi Stoddard in 8 hrs.  

Going to a hosted reception last night I chatted with some good friends and met new people.

One characteristic I find most interesting is people who are in a learning mode.  I enjoy the smart people who realize they need to try new things to learn.  Here is a post on Facebook page that is popular.


 
Robert Kiyosaki · 863,574 like this
November 4, 2011 at 7:00pm · 
  • In the real world, the smartest people are people who make mistakes and learn. In school, the smartest people don’t make mistakes.

Do you find you are surrounded by smart people who have the bad habits from school of showing how good their grades are and how they make no mistakes.  Everybody makes mistakes.  To err is human.  I've been paying more attention to the mistakes I make.  Do you? Do your friends?

The more you trust someone it is easier to admit your mistakes.  If you don't trust someone, why would discuss your mistakes.  If you don't trust someone, why are spending time with them?  Life is too short to spend with people who you don't trust.

Some of the best data center discussions I've ever had are when we discuss mistakes made.

Won't be blogging much this week, focused on listening, learning and networking

I am at GigaOm Structure and I find if it is really hard to listen, learn, network and blog at the same time.  I can time shift the blogging to later, so I am going to focus on listening to the presentations, networking like crazy, and learning as much as I can.

Here is a sample of what is covered at GigaOm Structure.

See inside Facebook’s network & explore Google’s data dreams at Structure

 

JUN. 17, 2013 - 6:00 AM PDT

No Comments

SUMMARY:

Infrastructure nerds, it’s time to meet the accountants. At this year’s Structure conference this Wednesday and Thursday we’re focusing on the economics of cloud computing, not just for vendors, but for practitioners.

Want to understand how Facebook connects its servers? Hear from VMware’s CEO how the virtualization giant plans to build its next big business? Discover why Snapchat builds on Google App Engine as opposed to Amazon Web Services? Or maybe you want to understand if Microsoft can compete in the cloud.

Google publishes ideas discussing Good Enough approach to achieve low latency

It can be really hard to get the media to publish complex concepts which is why companies will submit their own articles.  Google's Luiz Barroso and Jeff Dean have an article on Google's Data Center challenge to provide low latency performance at scale.


The Tail at Scale

 


 





Systems that respond to user actions quickly (within 100ms) feel more fluid and natural to users than those that take longer.3Improvements in Internet connectivity and the rise of warehouse-scale computing systems2 have enabled Web services that provide fluid responsiveness while consulting multi-terabyte datasets spanning thousands of servers; for example, the Google search system updates query results interactively as the user types, predicting the most likely query based on the prefix typed so far, performing the search and showing the results within a few tens of milliseconds. Emerging augmented-reality devices (such as the Google Glass prototype7) will need associated Web services with even greater responsiveness in order to guarantee seamless interactivity.

The article can be long for most and here are two key points.

In large information-retrieval (IR) systems, speed is more than a performance metric; it is a key quality metric, as returning good results quickly is better than returning the best results slowly. Two techniques apply to such systems, as well as other to systems that inherently deal with imprecise results:

Good enough. In large IR systems, once a sufficient fraction of all the leaf servers has responded, the user may be best served by being given slightly incomplete ("good-enough") results in exchange for better end-to-end latency. The chance that a particular leaf server has the best result for the query is less than one in 1,000 queries, odds further reduced by replicating the most important documents in the corpus into multiple leaf servers. Since waiting for exceedingly slow servers might stretch service latency to unacceptable levels, Google's IR systems are tuned to occasionally respond with good-enough results when an acceptable fraction of the overall corpus has been searched, while being careful to ensure good-enough results remain rare. In general, good-enough schemes are also used to skip nonessential subsystems to improve responsiveness; for example, results from ads or spelling-correction systems are easily skipped for Web searches if they do not respond in time.

Google has used a technique like sticking your toe in the water to test out an environment before jumping.  They call it a canary request.

Canary requests. Another problem that can occur in systems with very high fan-out is that a particular request exercises an untested code path, causing crashes or extremely long delays on thousands of servers simultaneously. To prevent such correlated crash scenarios, some of Google's IR systems employ a technique called "canary requests"; rather than initially send a request to thousands of leaf servers, a root server sends it first to one or two leaf servers. The remaining servers are only queried if the root gets a successful response from the canary in a reasonable period of time. If the server crashes or hangs while the canary request is outstanding, the system flags the request as potentially dangerous and prevents further execution by not sending it to the remaining leaf servers. Canary requests provide a measure of robustness to back-ends in the face of difficult-to-predict programming errors, as well as malicious denial-of-service attacks.

The canary-request phase adds only a small amount of overall latency because the system must wait for only a single server to respond, producing much less variability than if it had to wait for all servers to respond for large fan-out requests; compare the first and last rows in Table 1. Despite the slight increase in latency caused by canary requests, such requests tend to be used for every request in all of Google's large fan-out search systems due to the additional safety they provide.