Peak Inside Code Conference, a place not interesting to data center executives

Here is a post on what it is like to attend the Code Conference.  It’s interesting, but I have no interest in attending.

Secrets of the Code Conference

BradAnimoto

Entrepreneur’s dream: Animoto’s Brad Jefferson with Walt Mossberg

One of the perennial marketing conundrums facing start-up CEOs is picking the best industry conferences to attend to boost their companies’ brands and expand their networks. This week, some lucky entrepreneurs dropped in on what may be the gold standard for tech-industry confabs: The Code Conference.

This is a schmooz event for C level executives.

It was refreshing to hear the big-time CEOs and company founders discuss their companies’ problems and learn that many of them relate to the same issues vexing smaller companies, Shank said—recruiting, dealing with product snafus, etc. Uber Co-Founder and CEO Travis Kalanick talked about not taking a salary for four years and living at home with his mother, which he said didn’t do wonders for his dating life.

See you at 7x24 Exchange Boca Raton

By this time tomorrow I will be at 7x24 Exchange Boca Raton.  Many of my friends have made going to 7x24 Exchange a regular part of our conference plans.  Some going once a year.  I go twice a year and after 6 continuous trips over the past 3 years, I always learn something, make new friends, and enjoy discussing data centers with good friends and smart people.

NewImage

The 7x24 Exchange hears the same from their attendees.

NewImage

Human Perception behind Joyent and Adobe's outage, Typing is Not A Cause

Joyent and Adobe have the outages for the month of May.

Joyent post their post-mortem.

Adobe posts an apology.

Both of these outages occurred during maintenance where someone type a command that did what it was supposed to do impacting a service.  The human perception problem is the person who typed the command could not see/perceive the system wide impact of their command.

Adobe doesn’t provide any details on what they plan to do.  Joyent does.

We will be taking several steps to prevent this failure mode from happening again, and ensuring that other business disaster scenarios are able to recover more quickly.

First, we will be dramatically improving the tooling that humans (and systems) interact with such that input validation is much more strict and will not allow for all servers, and control plane servers to be rebooted simultaneously. We have already begun putting in place a number of immediate fixes to tools that operators use to mitigate this, and we will be rethinking what tools are necessary over the coming days and weeks so that "full power" tools are not the only means by which to accomplish routine tasks.

You can imagine the efforts people will go through to create safeguards to eliminate the possibility of this type of outage.  Unfortunately, this will also most likely put a burden on day to day operations.

Another way to solve the problem is to give people the ability to see the impact of their actions.  No one in their right mind would execute a command to reboot all the servers at Joyent.  And no one in their right mind would delete a directory with all user records.

Watching Google's Data Center Machine Learning News spread

I was curious on how Google’s Data Center Machine Learning news would spread. 

At 1a on May 28, 2014 google posted on its main company blog with this kind of traffic over the past two days.

NewImage

The following are three posts that went live at 1a PT May 28, 2014 as well with the google post and they were able to interview Joe Kava, VP of Data Centers

http://gigaom.com/2014/05/28/google-is-harnessing-machine-learning-to-cut-data-center-energy/

Google’s head of data center operations, Joe Kava, says that the company is now rolling out the machine learning model for use on all of its data centers. Gao has spent about a year building it, testing it out and letting it learn and become more accurate. Kava says the model is using unsupervised learning, so Gao didn’t have to specificy the interactions between the data is — the model will learn those interactions over time.

http://www.datacenterknowledge.com/archives/2014/05/28/google-using-machine-learning-boost-data-center-efficiency/

http://www.wired.com/2014/05/google-data-center-ai/

The Wired article spun the machine learning as an Artificial Brain which gave them more traffic than others.

NewImage

But as I wrote Google’s machine learning is not really AI the way people would think.

BTW, in looking at the other articles, I realized my mistake.  In my post at 1a on May 28, I was a total nerd and got focused on the technology and didn’t mention Joe Kava’s name in my post even though I had interviewed him.  Damn.

Throughout the day the rest of the tech media were able to add their own posts.  I don’t know about you, but I am pretty impressed that Google was able to execute a media strategy that got the range of tech media to post on its Going Beyond PUE with Machine Learning.  PUE is not something widely discussed beyond the data center crowd.

Note the ComputerWeekly post was at the event where Joe Kava Keynoted and got 10 minutes of Joe’s time.  

My 10 minutes with Google's datacentre VP

ComputerWeekly.com (blog) - ‎May 28, 2014‎
Google's Joe Kava speaking at the Google EU Data Center Summit (Photo credit: Tom Raftery) ... Google's network division, which is the size of a medium enterprise, had a technology refresh and by spending between $25,000 and $50,000 per site, we could improve their high availability features and improve their PUEs from 2.2 to 1.5. The savings ... As more volumes of data are created and as mass adoption of the cloud takes place, naturally it will require IT to think about datacentres and its efficiency differently.
 

Google Blog: Better Data Centers Through Machine Learning

PCBDesign007 - ‎May 28, 2014‎
It's no secret that we're obsessed with saving energy. For over a decade we've been designing and building data centers that use half the energy of a typicaldata center, and we're always looking for ways to reduce our energy use even further. In our pursuit ...
 

Google is improving its data centers with the power of machine learning

GeekWire - ‎May 28, 2014‎
google-datacenter-tech-05 In its continuing quest to improve the efficiency of its data centers, Google has found a new solution: machine learning. Jim Gao, an engineer on the company's data center team, has been hard at work on building a model of how ...

Google crafts neural network to watch over its data centers

Register - ‎May 28, 2014‎
The project began as one of Google's vaunted "20 per cent projects" by engineer Jim Gao, who decided to apply machine learning to the problem of predicting how the power usage effectiveness of Google's data centers would change in response to tweaking ...
 

Google's Machine Learning: It's About More Than Spotting Cats

Wall Street Journal (blog) - ‎May 28, 2014‎
Google said in a blog post Wednesday that it is using so-called neural networks to reduce energy usage in its data centers. These computer brains are able to recognize patterns in the huge amounts of data they are fed and “learn” how things like air ...
 

Google data centers get smarter all on their own -- no humans required

VentureBeat - ‎May 28, 2014‎
While most of us were thinking that research would turn out speech recognition consumer products, it actually turns out that Google has applied its neural networks to the challenge of making its vast data centers run as efficiently as possible, preventing the ...
 

Google AI improves datacentre energy efficiency

ComputerWeekly.com - ‎May 28, 2014‎
“Realising that we could be doing more with the data coming out of datacentres, Jim studied machine learning and started building models to predict – and improve – datacentre performance.” The team's machine learning model behaves like other machine ...
 

Google taps machine learning technology to zap data center electricity costs

Network World (blog) - ‎May 28, 2014‎
Google is using machine learning technology to forecast - with an astounding 99.6% accuracy -- the energy usage in its data centers and automatically shift power to certain sites when needed. Using a machine learning system developed by its self ...
 

Google's machine-learning data centers make themselves more efficient

Ars Technica - ‎May 28, 2014‎
Google's data centers are famous for their efficient use of power, and now they're (literally) getting even smarter about how they consume electricity. Google today explained how it uses neural networks, a form of machine learning, to drive energy usage in its ...
 

Google is harnessing machine learning to cut data center energy

Bayoubuzz - ‎May 28, 2014‎
Leave it to Google to have an engineer so brainy he hacks out machine learning models in his 20 percent time. Google says that recently it's been using machine learning — developed by data center engineer Jim Gao (his Googler nickname is “Boy Wonder”) ...
 

Google turns to machine learning to build a better datacentre

ZDNet - ‎May 28, 2014‎
"The application of machine learning algorithms to existing monitoring data provides an opportunity to significantly improve DC operating efficiency," Google'sJim Gao, a mechanical engineer and data analyst, wrote in a paper online. "A typical large-scale ... These models can accurately predict datacentre PUE and be used to automatically flag problems if a centre deviates too far from the model's forecast, identify energy saving opportunities and test new configurations to improve the centre's efficiency. "This type of ...
 
 
 

Does Google's Data Center Machine Language Model have a debug mode? It should

I threw two posts(1st post and 2nd post) up on Google’s use of Machine Language in the Data Center and said I would write more.  Well here is another one.

Does Google’s Data Center Machine Language Model have a debug mode?  The current system describes the use of data collected every 5 minutes over about 2 years.

 184,435 time samples at 5 minute resolution (approximately 2 years of operational data

One of the methods almost no one does is debug their mechanical systems as if you were debugging software. 

Debugging is a methodical process of finding and reducing the number of bugs, or defects, in a computer program or a piece of electronic hardware, thus making it behave as expected. Debugging tends to be harder when various subsystems are tightly coupled, as changes in one may cause bugs to emerge in another.

What would debugging mode look like in DCMLM (my own acronym for Data Center Machine Language Model)?  You are seeing performance that looks like the subsystem is not performing as expected.  Change the sampling rate to 1 second.  Hopefully the controller will function correctly at a higher sample rate.  The controller may work fine, but the transport bus may not.  With the 1 second fidelity make changes to settings and collect data.  Repeat changes.  Compare results.  Create other stress cases.

What will you see?  From the time you make the changes in a setting how long does it take for you to achieve the desired state.  At the 5 minute sampling you cannot see the transition and the possibly delays.  Was the transition smooth or a step function.  Was there an overshoot in value and then corrections?

The controllers have code running in them, sensors go bad, wiring connections are intermittent.  How do you find these problems?  Being able to go into Debug mode could be useful.

If Google was able to compare detailed operations of two different installations of the same mechanical system, then they could find whether there was a problem that is unique to a site.  Or they may simply compare the same system at different points of time.