Uptime Warns Data Center Pros on Using PUE, a Simple fix

August 14, 2008 Dave Ohara

Matt Stansberry writes on Uptime’s seminar giving warning on the use of PUE.

Uptime warns data center pros against being benchmarked on PUE

Posted by: Matt Stansberry
Data Center Metrics, DataCenter, Green data center
Uptime Institute executive director Ken Brill warned panelists at an online seminar today to be wary of very low Power Usage Effectiveness (PUE) ratios touted by some data center operators. “If your management begins to benchmark you against someone else’s data center PUE, you need to be sure what you’re benchmarking against,” Brill said.

Brill said he’s seen companies talking about a PUE of 0.8 — which is physically impossible. “There is a lot of competitive manipulation and gaming going on,” Brill said. “Our network members are tired of being called in by management to explain why someone has a better PUE than they do.”

If you’re going to compare your PUE against another company, you need to know what the measurement means. “You need to know what they’re saying and what they’re not saying,” Brill said. “Are you going to include the lights and humidification system? If you’re using free cooling six months of the year, do you report your best PUE?”

Matt was nice enough to send me this link and ask what I thought.

Here is a simple fix to the problem. PUE should be reported as a range of #’s low to high, and the average calculated over a period of time. This could be a graph. For example, Microsoft shows their PUE with this graph.

This graph shows 3 years of history and how the #’s have fluctuated. This graph is credible. A static PUE # has little meaning as it is just one data point with no background.

I’ve written about this issue before that PUE is dynamic.

I've been meaning to write about PUE, and have been stumped in that It is defined as a metric, and in the Green Grid document referenced it makes no reference that is dynamic. In reality PUE will be a dynamic # that changes as the load changes in a room. How ironic would it be that your best PUE # is when all the servers are running at near capacity, and shutting down servers to save power will increase your PUE? Or your energy efficient cooling system uses large amounts of water in Southern California where it is just a matter of time before water shortages will cause more environmental issues?

What helped me to think of PUE as a dynamic # is to think of it as quality control metric. The quality of the electrical and mechanical systems and their operations over time are inputs into PUE. As load changes and servers will be turned off the variability of the power and cooling systems influence you PUE. So, PUE can now have a statistical range of operation given the conditions. This sounds familiar. It's statistical process control.

Statistical Process Control (SPC) is an effective method of monitoring a process through the use of control charts. Much of its power lies in the ability to monitor both process centre and its variation about that centre. By collecting data from samples at various points within the process, variations in the process that may affect the quality of the end product or service can be detected and corrected, thus reducing waste and as well as the likelihood that problems will be passed on to the customer. With its emphasis on early detection and prevention of problems, SPC has a distinct advantage over quality methods, such as inspection, that apply resources to detecting and correcting problems in the end product or service.

At Data Center Dynamics Seattle, Microsoft’s Mike Manos said the average PUE for Microsoft data centers is 1.6, and his team is driving for 1.3 in 2 years. When Mike hits a PUE of 1.3 I am sure he’ll show us a graph to prove Microsoft has hit it.

The Virtual Data Center idea, VMware is Listening

August 11, 2008 Dave Ohara

I wrote my post on The Virtual Data Center idea on July 31, 2008.

You add up the management changes of Paul Maritz, 120,000 of data center space, and VMware's upcoming announcements. I am guessing VMware will announce its Virtual Data Center - cloud computing initiative.

When I was checking my web metrics, I could see this post got VMware’s attention.

I’ve written 8 posts on VMware (NYSE:VMW) over the past month, and I got the bump in VMware traffic right after my Virtual Data Center post.

Living Data Center, Skanska's Tool

August 8, 2008 Dave Ohara

A great book to read to think about Green Data Center principles is The Living Company. It has 5 stars from 21 customer reviews on amazon.com

Here is an excerpt from a BusinessWeek post.

After all of our detective work, we found four key factors in common:
1. Long-lived companies were sensitive to their environment. Whether they had built their fortunes on knowledge (such as DuPont's technological innovations) or on natural resources (such as the Hudson Bay Company's access to the furs of Canadian forests), they remained in harmony with the world around them. As wars, depressions, technologies, and political changes surged and ebbed around them, they always seemed to excel at keeping their feelers out, tuned to what-ever was going on around them. They did this, it seemed, de-spite the fact that in the past there were little data available, let alone the communications facilities to give them a global view of the business environment. They sometimes had to rely for information on packets carried over vast distances by portage and ship. Moreover, societal considerations were rarely given prominence in the deliberations of company boards. Yet they managed to react in timely fashion to the conditions of society around them.
2. Long-lived companies were cohesive, with a strong sense of identity. No matter how widely diversified they were, their employees (and even their suppliers, at times) felt they were all part of one entity. One company, Unilever, saw itself as a fleet of ships, each ship independent, yet the whole fleet stronger than the sum of its parts. This sense of belonging to an organization and being able to identify with its achievements can easily be dismissed as a "soft" or abstract feature of change. But case histories repeatedly showed that strong employee links were essential for survival amid change. This cohesion around the idea of "community" meant that managers were typically chosen for advancement from within; they succeeded through the generational flow of members and considered themselves stewards of the longstanding enterprise. Each management generation was only a link in a long chain. Except during conditions of crisis, the management's top priority and concern was the health of the institution as a whole.
3. Long-lived companies were tolerant. At first, when we wrote our Shell report, we called this point "decentralization." Long-lived companies, as we pointed out, generally avoided exercising any centralized control over attempts to diversify the company. Later, when I considered our research again, I realized that seventeenth-, eighteenth-, and nineteenth-century managers would never have used the word decentralized; it was a twentieth-century invention. In what terms, then, would they have thought about their own company policies? As I studied the histories, I kept returning to the idea of "tolerance." These companies were particularly tolerant of activities on the margin: outliers, experiments, and eccentricities within the boundaries of the cohesive firm, which kept stretching their understanding of possibilities.
4. Long-lived companies were conservative in financing. They were frugal and did not risk their capital gratuitously. They understood the meaning of money in an old-fashioned way; they knew the usefulness of having spare cash in the kitty. Having money in hand gave them flexibility and independence of action. They could pursue options that their competitors could not. They could grasp opportunities without first having to convince third-party financiers of their attractiveness.

I've been meaning to write about some of these ideas and yesterday at Data Center Dynamics Seattle, I met Jakob Carnemark from Skanska who embraces these ideas with tools they are developing for the Green Data Center by enabling the Living Data Center in the same principles used in "The Living Company".

Skanska's approach of continuous process improvement fits what I have written in the past on being Green is not a binary decision, but a commitment.

I am going to follow up with Jakob to get more details on how they create metrics, monitoring and modeling. His 3M's vs my M3 (monitoring, metering, managing).

Microsoft's Christian Belady, "Energy Efficiency is More a Behavior Problem, not a Technical one"

August 5, 2008 Dave Ohara

Christian Belady has his posted on Changing Data Center Behavior Based on Chargeback Metrics. Here is the main point and results from the post.

Changing the Charging Model

In my presentation, I described how Microsoft now charges for data center services based on a function of kW used. If someone upgrades to a high-density blade server, they do not reduce their costs unless they also save power. This change created a significant shift in thinking among our customers, together with quite a bit of initial confusion, requiring us to answer the stock question “You’re charging for WHAT?” with “No, we’re charging for WATTS!”
Recording the Changes

From our perspective, our charging model is now more closely aligned with our costs. By getting our customers to consider the power that they use rather than space, then power efficiency becomes their guiding light. This new charging model has already resulted in the following changes:

Optimizing the data center design

Implement best practices to increase power efficiency.
Adopt newer, more power efficient technologies.
Optimize code for reduced load on hard disks and processors.
Engineer the data center to reduce power consumption.

Sizing equipment correctly

Drive to eliminate Stranded Compute by:

Increase utilization by using virtualization and power management technologies.
Selecting servers based on application throughput per watt.
Right sizing the number of processor cores and memory chips for the application needs.

Drive to eliminate stranded power and cooling—ensure that the total capacity of the data center is used. Another name for this is data center utilization and it means that you better be using all of your power capacity before you build your next data center. Otherwise, why did you have the extra power or cooling capacity in the first place...these are all costs you didn’t need.

I will be discussing the concepts of stranded compute, power, and cooling in greater detail in later posts.

Christian predicts electricity based chargebacks will change the behavior of the industry to think in terms of processing capability per kilowatt used.

Moving the Goalposts

I think it will take quite a bit of time for manufacturers to realize that the goalposts have moved. At present, it is quite difficult to get the answer to questions such as “What is the processing capacity of your servers per kilowatt of electricity used?” However, I do believe this change will come, which will drive rapid innovation along an entirely different vector, where system builders compete to create the most energy efficient designs. The benchmarking body, SPEC, has already started down this path with their SPECpower benchmark, but this needs to be done with applications.

$2,600,000 UC San Diego Energy Efficient Computing project, Heavily Instrumentation & Monitoring to Calculate Performance/Watt

July 28, 2008 Dave Ohara

UC San Diego has an article about their new Energy Efficient Computing project, GreenLight.

UC San Diego’s GreenLight Project to Improve Energy Efficiency of Computing

July 28, 2008
By Doug Ramsey
The information technology industry consumes as much energy and has roughly the same carbon “footprint” as the airline industry. Now scientists and engineers at the University of California, San Diego are building an instrument to test the energy efficiency of computing systems under real-world conditions – with the ultimate goal of getting computer designers and users in the scientific community to re-think the way they do their jobs.

This Sun Modular Datacenter deployed on the UC San Diego campus will be instrumented for the GreenLight project to offer full-scale processing and storage in order to test how to make computing more energy-efficient.
The National Science Foundation will provide $2 million over three years from its Major Research Instrumentation program for UC San Diego’s GreenLight project. An additional $600,000 in matching funds will come from the UCSD division of the California Institute for Telecommunications and Information Technology (Calit2) and the university’s Administrative Computing and Telecommunications (ACT) group.
The GreenLight project gets its name from its plan to connect scientists and their labs to more energy-efficient ‘green’ computer processing and storage systems using photonics – light over optical fiber.

The goal of GreenLight is to under computational performance per watt.

The GreenLight Instrument will enable an experienced team of computer-science researchers to make deep and quantitative explorations in advanced computer technologies, including graphics processors, solid-state disks, photonic networking, and field-programmable gate arrays (FPGAs). Jacobs School of Engineering computer science professor Rajesh Gupta and his team will explore alternative computing fabrics from array processors to custom FPGAs and their respective models of computation to devise architectural strategies for efficient computing systems.
“Computing today is characterized by a very large variation in the amount of effective work delivered per watt, depending upon the choice of the architecture and organization of functional blocks,” said Gupta. “The project seeks to discover fundamental limits of computing efficiency and device organizing principles that will enable future system builders to architect machines that are orders-of-magnitude more efficient modern-day machines, from embedded systems to high-performance supercomputers.”
The computing and systems research will yield new quantitative data to support engineering judgments on comparative “computational work per watt” across full-scale applications running on full-scale computing platforms.

This is a big win for Sun Microsystems and their containers.

“Using the Sun Modular Datacenter as a core technology and making all measurements available as open data will form a unique, Internet-accessible resource that will have a dramatic impact on academic, government and private-sector computing,” said Emil J. Sarpa, Director of External Research at Sun Microsystems, Inc. “By placing experimental hardware configurations alongside traditional rack-mounted servers and then running a variety of computational loads on this infrastructure, GreenLight will enable a new level of insight and inference about real power consumption and energy savings.”
According to DeFanti, the project decided to build the GreenLight Instrument around the Sun Modular Datacenter because, “it’s the fastest way to construct a controlled experimental facility for energy research purposes.” The modular structure also means the GreenLight Instrument can be cloned – unlike bricks-and-mortar computer rooms that cannot be ordered through purchasing.

Interior of the Sun Modular Datacenter prior to deployment of up to 280 servers and other equipment that will turn the shipping container into the GreenLight Instrument.

And to make things a bit sexy, they plan on using a virtual environment to visualize inside the containers..

Rather than give scientists physical access to the GreenLight Instrument, OptIPortal tiled display systems will serve as visual termination points – allowing researchers to “see” inside the instrument. Users will also be able to query and visualize all sensor data in real time and correlate it interactively and collaboratively in this immersive, multi-user environment.
Once a virtual environment of the system has been created, scientists will be able to walk into a 360-degree virtual reality version in Calit2’s StarCAVE. Users will be able to zoom into the racks of clusters as well as see and hear the power and heat, from whole clusters of computers down to the smallest instrumented components, such as computer processing and graphics processing chips.