Google Implements Software Defined PUE, 99.6% Accurate DC Performance Modeling

Google has posted a paper on its Machine Learning Application for Data Center Optimization and a blog post.

PUE is a topic that Google knows well.  I was one of the first to interview Urs Hoelzle on Google’s PUE back in Oct 2008.  Over the years Google has shared more and more about its data centers, and their PUE has continued to get better, but there are limits of what people can do.

NewImage

After a while you reach a level of diminishing returns and people get tired and frustrated of getting that 0.01 improvement in PUE. 

Seems obvious to use computers and run simulations of operations.  Many have tried to build models, but this approach has not been wildly successful as the complexity of data center cooling systems is not easy to model.

Simulation is the imitation of the operation of a real-world process or system over time.[1] The act of simulating something first requires that a model be developed; this model represents the key characteristics or behaviors/functions of the selected physical or abstract system or process. 

Another approach is to discover the model based on operations data, and let the data define the model, a machine learning model uses this approach.  Warning this approach which is used in handwriting recognition and OCR requires training and testing to confirm the model is accurate.  Luckily Google with all the years of tracking PUE has data to train a model and a Mechanical Engineer who had the vision to tackle this problem.

Jim Gao, an engineer on our data center team, is well-acquainted with the operational data we gather daily in the course of running our data centers. We calculate PUE, a measure of energy efficiency, every 30 seconds, and we’re constantly tracking things like total IT load (the amount of energy our servers and networking equipment are using at any time), outside air temperature (which affects how our cooling towers work) and the levels at which we set our mechanical and cooling equipment. Being a smart guy—our affectionate nickname for him is “Boy Genius”—Jim realized that we could be doing more with this data. He studied up on machine learning and started building models to predict—and improve—data center performance.  

After some trial and error, Jim’s models are now 99.6 percent accurate in predicting PUE. This means he can use the models to come up with new ways to squeeze more efficiency out of our operations. For example, a couple months ago we had to take some servers offline for a few days—which would normally make that data center less energy efficient. But we were able to use Jim’s models to change our cooling setup temporarily—reducing the impact of the change on our PUE for that time period.

Why do this?

  1. A machine learning approach leverages the plethora of existing sensor data to develop a mathematical model that understands the relationships between operational parameters and the holistic energy efficiency. 
  2. This type of simulation allows operators to virtualize the DC for the purpose of identifying optimal plant configurations while reducing the uncertainty surrounding plant changes.
  3. Model applications include DC simulation to evaluate new plant configurations, assessing energy efficiency performance, and identifying optimization opportunities.

Results:  Google predicts PUE with 99.6% accuracy.  Google has successfully modeled its mechanical systems using machine learning.  Google has implemented a Software Defined PUE which allows them to predict PUE as systems and load changes.  Who doesn’t want this capability?  Almost everyone who has bought DCIM thought they would get this.

NewImage

There are many other ideas that Google has put together and I plan on writing more posts on what they have shared.  This is just the beginning of applying machine learning and neural networks to data center operations.  There are many other complex interactions that can be modeled.

Small World of Data Centers and Our Dear Olivier Sanche

This last week was a day to catch up with old friends, many of which were connected to Olivier Sanche.  One of the small world stories a friend told me is their move to SF Bay Area has gone well and his son has taken up Polo.  When I hear polo, I think of water polo, not the equestrian sport of polo.  So, the first time I heard his story I missed the point.  

Traveling to a polo match his wife and son gave a ride to another family (mother and daughter).  Traveling in the car the small talk came up what does your husband do, he works on data centers.  The single mom said oh that’s what my husband did until he passed away.  Oh, so sorry to hear.  What was his name?  Olivier Sanche.  OMG, your husband was Olivier, my husband knew him and was shocked by the news.

This last week I caught up with Olivier Sanche’s widow, Karine.   Karine told her story of meeting the unnamed data center executive who knew Olivier, and I gave more background on him and how he was a good friend of Olivier and I know the two of them connected well. 

Asking about how her daughter is doing Emilie, she is doing so well in school and she is so tall.  How tall?  5’ 8 1/2” and she’ll be 13 in Sept one week away from my daughter who will also be 13 and is maybe 4’ 10”.  Olivier and I had talked about one of these days our girls meeting each other, and one of these days hopefully they will.

Here are some pictures of Emilie playing Polo.

NewImage

NewImage

NewImage

NewImage

3rd Data Center Downtime Social, May 2014 - Friends Relaxing and Laughing

4 years ago thanks to Jim Grice’s sponsorship we had a dinner with some data center thought leaders.  The people were some of the top in the industry and any vendor would have paid big money to be at the table, but part of why the people went to the dinner is it was about friends being able to chat and meet new friends without any sales pressure.  Vendors were filtered out.  There was a bit of an exception for people who worked at companies that are considered vendors.  It worked so well, we thought what do we do next year?

In 2012, we decided to have another social, but invite more people and get away from the sit down dinner.  Vendors could come if they behaved and were considered thought leaders by others attending.  Let people mingle more yet sit down.  We took some ideas Steve Manos started with his Lee Tech on Tap events in Chicago and applied some new ones as well.

NewImage

Above is 2012

NewImage

Above is the event in 2014

No change in venue.  It works well.  7x24 Exchange has been a generous sponsor to host gathering of friends.

2013 we had people fly in from Seattle, Chicago, Texas, North Carolina, New York, Southern Ca, and Georgia to spend 3-4 hours drinking a few beers and talking to people who operate and build data centers.

For 2014 it was a smaller event, but still good.  Two people flew in and asked what else was to do.  I told them Uptime Symposium was going on and the two registered for the event as first time attendees.  

Will there be a Data Center Downtime Social in 2015.  Most likely.  Especially when I get comments like this.

I wanted to thank you for your hospitality last week. It's always good to get away from the masses and have more intimate dialogue with key people.

There will be data center social at 7x24 Exchange hosted by some friends.  There was the big data center social in LV last month.  A bunch of people are getting together for their annual fishing social in July.

The idea of a data center social where friends can hang out continues to build.