Microsoft Research Publishes Study Saving DiskEnergy in a Microsoft Data Center

Adding another Microsoft Research paper on saving energy by turning off Windows Live Servers in addition to the below about DiskEnergy.

Appended Feb 4, 2008

Microsoft Research has been applying some of their resources to Microsoft Data Centers.  The latest public information is being presented at the Usenix FAST conference in February 2008.  If you want to read their paper you can go to here.

Power consumption is a major problem for data centers of all sizes which impacts the density of servers and the total cost of ownership. This is causing changes in data center configuration and management. Some components already support some power management features, for example server CPUs support dynamic clock and voltage scaling that enables power requirements to be reduced significantly during idle periods. Storage subsystems do not have power management and are consume a significant amount of power in the data center. Modern enterprise grade disks require approximately 10W when idle. As storage requirements generally increase in data centers, the number of disks in data centers is increasing proportionally.

Based on 1-week long traces of core servers in our data center, we have found that there are significant periods of idle time during which disks can be spun down, and even longer “write-only” periods during which all I/O operations are writes. Based on this we have developed a technique called “write off-loading” which allows disks to stay spun down during these write-only periods, by temporarily off-loading the write requests to other volumes in the data center. Our results show that this provides power savings of 45—60%. This work will presented at the Usenix FAST conference in February 2008.

We believe that write off-loading is a viable technique for saving energy in enterprise storage. In order to use write off-loading, a system administrator needs to manage the trade-off between energy and performance. We are designing tools to help administrators decide how to save the most energy with the least performance impact.

Appended Feb 3, 2008.

Based on this post, we can expect more content from the Microsoft Research group in regards to data center technologies.

Read more

Dynamic PUE real world use

I've been meaning to write about PUE, and have been stumped in that It is defined as a metric, and in the Green Grid document referenced it makes no reference that is dynamic. In reality PUE will be a dynamic # that changes as the load changes in a room. How ironic would it be that your best PUE # is when all the servers are running at near capacity, and shutting down servers to save power will increase your PUE? Or your energy efficient cooling system uses large amounts of water in Southern California where it is just a matter of time before water shortages will cause more environmental issues?

What helped me to think of PUE as a dynamic # is to think of it as quality control metric. The quality of the electrical and mechanical systems and their operations over time are inputs into PUE.  As load changes and servers will be turned off the variability of the power and cooling systems influence you PUE.  So, PUE can now have a statistical range of operation given the conditions.  This sounds familiar.  It's statistical process control.

Statistical Process Control (SPC) is an effective method of monitoring a process through the use of control charts. Much of its power lies in the ability to monitor both process centre and its variation about that centre. By collecting data from samples at various points within the process, variations in the process that may affect the quality of the end product or service can be detected and corrected, thus reducing waste and as well as the likelihood that problems will be passed on to the customer. With its emphasis on early detection and prevention of problems, SPC has a distinct advantage over quality methods, such as inspection, that apply resources to detecting and correcting problems in the end product or service.

For example, a breakfast cereal packaging line may be designed to fill each cereal box with 500 grams of product, but some boxes will have slightly more than 500 grams, and some will have slightly less, in accordance with a distribution of net weights. If the production process, its inputs, or its environment changes (for example, the machines doing the manufacture begin to wear) this distribution can change. For example, as its cams and pulleys wear out, the cereal filling machine may start putting more cereal into each box than specified. If this change is allowed to continue unchecked, more and more product will be produced that fall outside the tolerances of the manufacturer or consumer, resulting in waste. While in this case, the waste is in the form of "free" product for the consumer, typically waste consists of rework or scrap.

By observing at the right time what happened in the process that led to a change, the quality engineer or any member of the team responsible for the production line can troubleshoot the root cause of the variation that has crept in to the process and correct the problem.

This last point of observing at the right time what happened in the process that led to a change ultimately what needs to be achieved with a dynamic PUE system.  Without a system like this and mindset, you wouldn't know how to fix PUE problems. Which is what I think is wrong with a static PUE mindset.  You need a closed loop feedback to monitor the PUE and see if it is performing as expected given the operating conditions and load.

Note: the point about breakfast cereal reminds of Microsoft's Mike Manos, Sr. Director Data Center Services, and his first job working in Rice a Roni operations, learning process control, which is probably why he has invested in software from OSIsoft to help monitor PUE.  Cornell uses the same SW as well.  For more details see Microsoft's Jeff O'Reilly presentation or Cornell's Jason Banfelder presentation.

Read more

Power Monitoring Equipment Review - Smart-Watt and a visit to Microsoft

I've had a chance to play with the Smart-Watt device that I mentioned last month. I am still experimenting with the device, and waiting for my new office/lab space to do more thorough testing. My first impressions are I like Smart-Watt for the following reasons:

  1. It's easy to use as an inline device.  More devices are being developed for 3-phase power and power strip monitoring.
  2. Instead of investing in display and UI on the device, the controls are all from the PC and data is written to your data collection PC/Server in a SQL Express database using the .NET framework as the development platform.  You can set up a FTP server to share the data as well.
  3. A separate network, Smart-Net, using RJ-11 connectors makes daisy chaining devices easy, and does not need the approval of the network administrators to install.
  4. Temperature and Humidity can be collected as well, and leaves the opportunity for expansion for other devices to put additional sensors on the network.

Today Dan Dieso from Smart-Works and I visited Microsoft to talk to some people who run lab environments. It was good to hear the green/energy efficiency effort is expanding at Microsoft.  We heard stories of recently upgraded facilities due to power issues, and people are interested in setting up labs to support Green Data Center projects. Everyone we met at Microsoft was interested in Smart-Watt. One of the good discussions was with Scott Gaskins. Scott has a unique perspective being a SW development manager for microsoft.com's IT operation tools, and he worked for Pacific Gas & Electric for 13 years. So, he understands electricity and the issues about power conservation in the data center.  As a result microsoft.com is one of the most energy efficient properties in Microsoft's data centers, and is an innovation leader already deploying Windows Server 2008 which has power management turned on as default.

Another great connection was with Grant BlahaErath, a Technical Evangelist in Microsoft's ISV partner labs who has an adaptive cooling system with air side economizers in his server lab.  This could allow us to calculate a PUE, power utilization efficiency, for his lab, and he is interested in turning off devices when they are not needed.

As Dan and I make more progress with Microsoft's use of Smart-watt, we hope they'll give us permission to write about their experience.

Read more

Cisco's Green Guru proposes Cisco be the router of Energy Data

Cisco's new Green Guru, Paul Marcoux, proposes Cisco be the router of energy data in the data center. 

Using open standards, the company wants to get server and storage vendors to collect and share information about their equipment and send it to Cisco routers and switches. The data could include power consumption, operating temperature and more. It's becoming a critical job, and because the network touches all IT resources across the enterprise, data collection should happen there, according to Paul Marcoux, vice president of green engineering.

I think Cisco may find the vendors have other plans for this data then giving it to Cisco. Burton Group raises this issues as well in the article.

Cisco's proposal would represent a whole new role for networks beyond communications, said Burton Group analyst Dave Passmore. Server vendors might go along with the plan, but Cisco can't count on smooth sailing, he said. Centralized power regulation would play a role in overall management of the datacenter, an area where Cisco is attempting to make inroads with other initiatives as well.

"Who controls virtualization in the data center is going to be the new battleground," Passmore said.

Read more