Mike Manos discusses data center site selection, you need to “kick the dirt” to find what is real

At Gartner’s Data Center Conference, Mike Manos made an excellent point that “75% of the data center costs are effected by site selection.” Great architecture is designed to a site characteristics.  But, the status quo is to design data centers that are built based on past experiences.  Green data centers need to be designed to fit with site characteristics.

Mike wrote a post on site selection.

Kickin’ Dirt

December 21, 2009 by mmanos

mikeatquincy

I recently got an interesting note from Joel Stone, the Global Operations Chief at Global Switch.  As some of you might know Joel used to run North American Operations for me at Microsoft.  I guess he was digging through some old pictures and found this old photo of our initial site selection trip to Quincy, Washington.

As you can see, the open expanse of farmland behind me, ultimately became Microsoft’s showcase facilities in the Northwest.  In fact you can even see some farm equipment just behind me.   It got me reminiscing about that time and how exciting and horrifying that experience can be.

Kicking the Dirt.

Many people I speak to at conferences generally think that the site selection process is largely academic.   Find the right intersection of a few key criteria and locate areas on a map that seem to fit those requirements.   In fact, the site selection strategy that we employed took many different factors into consideration each with its own weight leading ultimately to a ‘heat map’ in which to investigate possible locations.

Even with some of the brightest minds, and substantial research being done, its interesting to me that ultimately the process breaks down into something I call ‘Kickin Dirt’.   Those ivory tower exercises ultimately help you narrow down your decisions to a few locations, but the true value of the process is when you get out to the location itself and ‘kick the dirt around’.   You get a feel for the infrastructure, local culture, and those hard to quantify factors that no modeling software can tell you. 

Mike makes an excellent point for the decision on site selection.

Once you have gone out and kicked the dirt,  its decision time.  The decision you make, backed by all the data and process in the world, backed by personal experience of the locations in question,  ultimately nets out to someone making a decision.   My experience is that this is something that rarely works well if left up to committee.  At some point someone needs the courage and conviction, and in some cases outright insanity to make the call.

Are you willing to take a risk in site selection?  Most aren’t.  But, the leaders are, and they are the ones who are first to go where others haven’t and have lower costs.  Mike has said the cost of the land was a great deal as no one thought of the land as a data center site.  Google are the others who have this down.

Read more

Carbon Neutral Google focus on methane gas reduction projects

Google had a blog post on its carbon offsets as part of their carbon neutral commitment.

Carbon offsets at Google

12/17/2009 06:38:00 PM

As leaders from around the world meet in Copenhagen to address global climate change this month, we thought it was a good time to reflect on our own carbon footprint. In 2007, we committed to become a carbon neutral company. We know that it isn't possible to write a check and eliminate the environmental impact of our operations. So what does “carbon neutrality” mean to us?

The 2nd paragraph after this introduction discusses Google’s data center work.

First, we aggressively pursue reductions in our energy consumption through energy efficiency, innovative infrastructure design and operations and on-site renewable energy. Our Google designed data centers use half the energy of typical facilities.

With Google’s resources it was interesting to see they have the same problem we all have in what are the right carbon offsets to buy?

Here at Google, we have set a very high bar to ensure that our investment makes an actual difference in reducing greenhouse gas emissions by purchasing offsets that are real, verifiable, permanent and additional.

Based on Google’s research, they have a primary focus on methane gas.

To date, we have selected high quality carbon offsets from around the world that reduce greenhouse gas emissions — ranging from landfill gas projects in Caldwell County, NC, and Steuben County, NY, to animal-waste management systems in Mexico and Brazil. Our funding helps make it possible for equipment to be installed that captures and destroys the methane gas produced as the waste decomposes. Methane, the primary component in natural gas, is a significant contributor to global warming. We chose to focus on landfill and agricultural methane reduction projects because methane's impact on warming is very well understood, it's easy to measure how much methane is captured and the capture wouldn't happen without our financing (for the projects we're investing in, they couldn't make enough money selling the gas).

One area I want to investigate further is Google'.org’s carbon offset projects.

We need fundamental changes to global energy and transportation infrastructure to stabilize greenhouse gas emissions over the long term. In the meantime, the projects to which we contribute offer measurable emissions reductions and allow us to take responsibility for our carbon footprint. To that end, we're always looking for good emissions-reduction projects to support. If you have a landfill gas or agricultural methane carbon offset project you think we should consider, please visit this page for more information about how to participate in our latest carbon-offset procurement round.

The submittal page states Jan 6, 2010 is the deadline for submittals.

Request for Proposals

Carbon Offsets

In order to participate in Google's Carbon Offset procurement round, please submit the following web form by 12:00pm PST on January 6, 2010. After we receive your information, we will send a link to our standard Non-Disclosure Agreement (NDA). Once we have your signed NDA, you will receive our Request for Proposal (RFP) document. NDA acceptance along with any questions related to the RFP are due by 12:00pm PST on January 13, 2010. All responses to the RFP are due by 12:00pm PST on February 1, 2010.

Read more

Ex-Microsoft Security Evangelist works for AWS, shouldn’t he have been transferred to Azure instead of fired?

A new AWS technical evangelist has a blog entry.

Hello, world!

Good day, everyone. I'm Steve Riley. In July 2009 I joined the AWS evangelism team. I spent my first few months absorbing information about all our offerings and am now getting back on the road again, speaking at various events and user groups and meeting with customers. I came from Microsoft, where I was in the telecommunications consulting practice for three years and in the Trustworthy Computing group for seven. I was a global security evangelist there and also worked closely with our chief security officer and enterprise security architect communities. I'm continuing that work here at Amazon Web Services, concentrating on enterprise deployment of cloud computing, all things cloud security, and of course the Windows Server aspects of our offerings.

I'm very excited to be part of AWS. The cloud is the future, and I look forward to meeting many of you and working together. As with all of us on the team, I'm here to help you succeed. More information in the links below.

Steve has a nice map of the Amazon EC2, S3, and CloudFront from one his presentations that is on his presentation page.

image

What I found interesting is Steve Riley was laid off from Microsoft’s security group, trustworthy computing.

Good bye, and good luck

ghost_light

Friends, as a part of Microsoft’s second round of restructuring, my position was eliminated yesterday and my employment with Microsoft has ended.

Shouldn’t Steve been transferred to Windows Azure instead of being laid off, and being hired by Amazon Web Services?

Read more

The Story of Cap & Trade videos – Annie Leonard and Warren Buffett

Annie Leonard has a new video on the story of cap and trade.

http://storyofcapandtrade.org - The Story of Cap & Trade is a fast-paced, fact-filled look at the leading climate solution being discussed at Copenhagen and on Capitol Hill. Host Annie Leonard introduces the energy traders and Wall Street financiers at the heart of this scheme and reveals the "devils in the details" in current cap and trade proposals: free permits to big polluters, fake offsets and distraction from whats really required to tackle the climate crisis. If youve heard about Cap & Trade, but arent sure how it works (or who benefits), this is the film is for you.

And, here is another from Warren Buffett who discusses cap and trade as a regressive tax.

Read more

ex-Intel engineers at Microsoft share processor secrets, optimize performance per watt

Microsoft’s Dileep Bhandarkar and Kushagra Vaid published a paper on Rightsizing Servers for cost and power savings which are important in a green data center strategy.  To put things in context both Dileep and Kushagra are ex-Intel processor engineers.  Let’s start with the summary from their paper

In conclusion, the first point to emphasize is that there is more to performance than just speed. When your definition of performance includes cost effectiveness, you also need to consider power. The next point is that in many cases processor speed has outpaced our ability to consume it. It’s difficult to exploit CPU performance across the board. This platform imbalance presents an opportunity to rightsize your configurations. The results will offer a reduction in both power and costs, with power becoming an increasingly important factor in the focus on total cost of ownership.

It is also important to remember that industry benchmarks may not reflect your environment. We strongly recommend that IT departments do their own workload characterization, understand the behavior of the applications in their own world, and then optimize for that.

Dileep and Kushagra are going out on a limb sharing details most wouldn’t.  Intel and server manufacturers goal is to maximize revenue per unit (chips or servers).  If you buy high performance chips in the belief you are buying  high performance per watt systems, then they’ll make more money.  But, the truth is many times you don’t need the high performance processors.  There are many server manufacturers who are selling to big data center companies high performance per watt systems that have low cost processors.

Dileep has a blog post that goes along with the paper.

Before I came to Microsoft to manage server definition and purchases I worked on the other side of the fence. For 17 years I focused on processor architecture and performance at Digital Equipment Corporation, and then worked for 12 years at Intel, focusing on performance, architecture, and strategic planning. It’s interesting how now that I’m a hardware customer, the word “performance” encompasses cost effectiveness almost as much as it does throughput and response time. As my colleague Kushagra Vaid and I point out in our paper, when you look up performance in the dictionary it is defined as “how well something performs the functions for which it’s intended”.

Why should you read this paper? Because as Dileep points out the vast majority of people are purchasing based on unrealistic configurations run under processor benchmarks.

Figure: Three-year total cost of ownership of a basic 1U server

It also surprises me that so many IT groups base their purchasing decisions on published benchmark data about processors, even though that data is often generated using system configurations that are completely unrealistic when compared to real-world environments. Most folks sit up and take note when I display the facts about these topics, because the subject is important.

Rightsizing can clearly reduce the purchase price and the power consumption of a server. But the benefits go beyond the savings in capital expenditure. The lower power consumption has a big impact on the Total Cost of Ownership as shown in the Figure.

So, let’s start diving into the secrets in Dileep and Kushagra’s paper.  Here is the background.

Introduction
How do you make sure that the servers you purchase and deploy are most efficient in terms of cost and energy? In the Microsoft Global Foundation Services organization (GFS)—which builds and manages the company’s datacenters that house tens of thousands of servers—we do this by first performing detailed analysis of our internal workloads. Then by implementing a formal analysis process to rightsize the servers we deploy an immediate and long term cost savings can be realized. GFS finds that testing on actual internal workloads leads to much more useful comparison data versus published benchmark data. In rightsizing our servers we balance systems to achieve substantial savings. Our analysis and experience shows that it usually makes more sense to use fewer and less expensive processors because the bottleneck in performance is almost invariably the disk I/O portion of the platform, not the CPU.

What benchmarks?  SPEC CPU2006.  Understand the conditions of the test.

One of the most commonly used benchmarks is SPEC CPU2006. It provides valuable insight into performance characteristics for different microprocessors central processing units (CPUs) running a standardized set of single-threaded integer and floating-point benchmarks. A multi-threaded version of the benchmark is CPU2006_rate, which provides insight into throughput characteristics using multiple running instances of the CPU2006 benchmark.

But important caveats need to be considered when interpreting the data provided by the CPU2006 benchmark suite. Published benchmark results are almost always obtained using very highly tuned compilers that are rarely if ever used in code development for production systems. They often include settings for code optimization switches uncommon in most production systems. Also, while the individual benchmarks that make up the CPU2006 suite represent a very useful and diverse set of applications, these are not necessarily representative of the applications running in customer production environments. Additionally, it is very important to consider the specifics of the system setup used for obtaining the benchmarking data (e.g., CPU frequency and cache size, memory capacity, etc.) while interpreting the benchmark results since the setup has an impact on results and needs to be understood before making comparisons for product selection.

and TPC.

Additionally, the system configuration is often highly tuned to ensure there are no performance bottlenecks. This typically means using an extremely high performing storage subsystem to keep up with the CPU subsystem. In fact, it is not uncommon to observe system configurations with 1,000 or more disk drives in the storage subsystem for breakthrough TPC-C or TPC-E results. To illustrate this point, a recent real-world example involves a TPC-C
4 | Rightsizing Servers to Achieve Cost and Power Savings in the Datacenter Published December 2009 result for a dual-processor server platform that has an entry level price a little over $3,000 (Source: http://www.tpc.org). The result from the published benchmark is impressive: more than 600,000 transactions per minute. But the total system cost is over $675,000. That’s not a very realistic configuration for most companies. Most of the expense comes from employing 144 GB of memory and over a thousand disk drives.

Both of these test are in general setup to show the performance of CPUs, but as Dileep and Kushagra say, few systems are used in these configurations.  So what do you do?  Rightsize the system which usually means don’t buy the high performing CPU.  As the CPU is not the bottleneck.  Keep in mind these are ex-Intel processor engineers.

CPU is typically not your bottleneck: Balance your systems accordingly
So how should you look at performance in the real world? First you need to consider what the typical user configuration is in your organization. Normally this will be dictated either by the capability or by cost constraints. Typically your memory sizes are smaller than what you see in published benchmarks, and you have a limited amount of disk I/O. This is why CPU utilization throughout the industry is very low: server systems are not well balanced. What can you do about it? One option is to use more memory so there are fewer disk accesses. This adds a bit of cost, but can help you improve performance. The other option—the one GFS likes to use—is to deploy balanced servers so that major platform resources (CPU, memory, disk, and network) are sized correctly.

So, what happens if you don’t rightsize?

If memory or disk bandwidth is under-provisioned for a given application, the CPU will remain idle for a significant amount of time, wasting system power. The problem gets worse with multicore CPUs on the technology roadmap, offering further increases in CPU pipeline processing capabilities. A common technique to mitigate this mismatch is to increase the amount of system memory to reduce the frequency of disk accesses.

The old rule was to buy the highest performing processors i could afford.  Why not?  Because it wastes money and increases your power costs.

Another aspect to consider is shown in Figure 2 below. If you look at performance as measured by frequency for any given processor, typically there is a non-linear effect. At the higher frequency range, the price goes up faster than the frequency. To make matters worse, performance does not typically scale linearly with frequency. If you’re aiming for the highest possible performance, you’re going to end up paying a premium that’s out of proportion with the performance you’re going to get. Do you really need that performance, and is the rest of your system really going to be able to use it? It’s very important from a cost perspective to find the sweet spot you’re after.

image

What is the relationship of system performance, CPU utilization and disks?

See Figure 5 on the next page shows CPU utilization increasing with disk count as the result of the system being disk limited. As you increase the number of disk drives, the number of transactions per second goes up because you’re getting more I/O and consequently more throughput. With only eight drives CPU utilization is just 5 percent. At 24 drives CPU utilization goes up to 20 percent. If you double the drives even more, utilization goes up to about 25 percent. What that says is that you’re disk I/O limited, so you don’t need to buy the most expensive, fastest processor. This kind of data allows us to rightsize the configuration, reducing both power and cost.

image

The paper goes on to discuss Web Servers where if content is cached a faster processor does help.

image

To share the blame, two RAID controllers are looked at one with 256 MB and another with 512MB of cache.

But when we looked at the results from our ETW workload analysis, we found that most of the time our queue depth never goes beyond 8 I/Os. So in our operational area, there is no difference in performance between the two RAID controllers. If we didn’t have the workload analysis and just looked at those curves, we might have been impressed by the 10-15 percent performance improvement at the high end of the scale, and paid a premium for performance we would never have used.

image

Read more