Mike Manos posted his comments on the CADE metric presented by McKinsey and Uptime.
Struggling with CADE, McKinsey / Uptime Metric
I guess I should start out this post with the pre-emptive statement that as a key performance indicator I support the use of CADE or metrics that tie both facilities and IT into a single metric. In fact we have used a similar metric internally at Microsoft. But the fact is at the end of the day I believe that any such metrics must be useful and actionable. Maybe its because I have to worry about Operations as well. Maybe its because I don't think you roll the total complexity of running a facility with one metric. In short, I don't think dictating yet another metric, especially one that doesn't lend itself to action, is helpful.
People were quoting Mike at Uptime to just start measuring something. I want to add a correction that given Mike’s experience he would never randomly measure something which is what other presenters were suggesting to take action. Mike knows what he wants is effective measurements, and knows whether he is picking a measurement that is useful.
The recommendation should be modified to “pick something to measure that you think is useful in the long term, start anywhere you want, pick up a clipboard.”
Measurements need to be thought as part of a closed loop feedback system where the measurements are indicators of how you are meeting operation service goals and whether modifications you are performing are effective.
The presenter at Uptime for the McKinsey study joked that he had a process that had you running in circles, and CADE would probably do that. As you keep on measuring CADE, it will have you running in circles, chasing what you need to do to make it better. The CADE numbers don’t have the expected result as Mike also points out some flaws.
As you cull out dead servers in your environment, your utilization will drop accordingly and as a result the metric will remain unchanged. The components of CADE are not independent. Dead servers are removed so that Average server utilization goes up then Data Center Utilization goes down showing proportionally so there is no change and if anything PUE goes up which means the metric may actually go up. Keep in mind that all results are good when kept in context of one another. Hosting Providers like Savvis, Equinix, Dupont Fabros, Digital Realty Trust, and the army of others will be exempt from participating. They will need to report back of house numbers to the their customers (effectively PUE). They do not have access to their customers server information It seems to me that CADE reporting in hosted environments will be difficult if not impossible. As the design of their facilities will need to play a large part of the calculation this makes effective tracking difficult. Additionally, overall utilization will be measured at what level? If hosters exempted, then it gives CADE a very limited application or shelf-life. You have to own the whole problem for it to be effective. As I mentioned, I think CADE has strong possibilities for those firms who own their entire stack. But most of the datacenters in the world would probably not fall into "all-in" scenario bucket.
This was my first Uptime Institute, and I have dozens of observations, I'll be making over the next couple of weeks.
As McKinsey and Uptime also promoted the idea of the energy czar. I disagree with the idea, and bounced a better idea off of multiple people at the conference that energy metrics should be integrated as part of capacity planning and reporting. Data Center staff and executives need to get used to seeing power numbers along with other key performance indicators for their data centers.
The last thing you want in the data center is to have an energy czar/nazi who measures their performance on energy savings who does not appreciate the relationships of efficiency projects to overall operations.
So, even though it was a nice effort to introduce the idea of a data center efficiency metric, it will not work as I first thought. The data center efficiency metric makes more sense to be developed by a group like the The Green Grid where the consensus driven model for publishing content will make sure a metric has industry support before publishing.