It was predictable that with Google sharing its use of Machine Learning in a mathematical model of a mechanical system that others would say they can do it too. DCK has a post on Romonet and Vigilent being other companies that use AI concepts in data centers.
Google made headlines when it revealed that it is using machine learning to optimize its data center performance. But the search giant isn’t the first company to harness artificial intelligence to fine-tune its server infrastructure. In fact, Google’s effort is only the latest in a series of initiatives to create an electronic “data center brain” that can analyze IT infrastructure.
One company that has welcomed the attention around Google’s announcement is Romonet, the UK-based maker of data center management tools.
Vigilent, which uses machine learning to provide real-time optimization of cooling within server rooms.
Google has been using Machine Learning for a long time and uses it for many other things like their Google Prediction API.
What is the Google Prediction API?
Here is a Youtube video from 2011 where Google is telling developers how to use this API.
Learn how to recommend the unexpected, automate the repetitive, and distill the essential using machine learning. This session will show you how you can easily add smarts to your apps with the Prediction API, and how to create apps that rapidly adapt to new data.
So you are all pumped up to get AI in your data center. But, here are two things you need to be aware of that can make your projects harder to execute.
First the quality of your data. Everyone has heard garbage in - garbage out. But when you create machine learning systems the accuracy of data can be critical. Google’s Jim Gao, their data center “boy genius” discusses one example.
Catching Erroneous Meter Readings
In Q2 2011,Google announced that it would include natural gas as part of ongoing efforts to calculate PUE in a holistic and transparent manner . This required installing automated natural gas meters at each of Google’s DCs. However, local variations in the type of gas meter used caused confusion regarding erroneous measurement units. For example, some meters reported 1 pulse per 1000 scf of natural gas, whereas others reported a 1:1 or 1:100 ratio. The local DC operations teams detected the anomalies when the realtime, actual PUE values exceeded the predicted PUE values by 0.02 - 0.1 during periods of natural gas usage.
Going through all your data inputs to make sure the data is clean is painful. Google used 70% of its data to train the model and 30% to validate the model. Are you that disciplined? Do you have a mechanical engineer on staff who can review the accuracy of your mathematical model?
Second, the culture in your company is an intangible to many. But, if you have been around enough data center operations staff, their habits and methods are not intangible. They are real and what makes so many things happen. Going back to Google’s Jim Gao. He had a wealth of subject matter expertise on machine learning and other AI methods in Google. He had help deploying the models from Google staff. And he had the support of the VP of data centers and the local data center operations teams.
I would like to thank Tal Shaked for his insights on neural network design and implementation. Alejandro
Lameda Lopez and Winnie Lam have been instrumental in model deployment on live Google data centers.
Finally, this project would not have been possible without the advice and technical support from Joe Kava,
as well as the local data center operations teams.
Think about these issues of data quality and the culture in your data center before you attempt an AI project. If you dig into automation projects it is rarely as easy as when people thought it would be.