When will solid state memory server be an option in AWS instances?

January 7, 2010 Dave Ohara

I was having another stimulating conversation in silicon valley last night, and one of the ideas that made sense is for solid state memory servers to be part of the cloud computing option. It’s just a matter of time. Amazon has their current instance offerings with a division of performance and memory.

Standard Instances

Instances of this family are well suited for most applications.

Small Instance (default)*

1.7 GB memory
1 EC2 Compute Unit (1 virtual core with 1 EC2 Compute Unit)
160 GB instance storage (150 GB plus 10 GB root partition)
32-bit platform
I/O Performance: Moderate

Large Instance

7.5 GB memory
4 EC2 Compute Units (2 virtual cores with 2 EC2 Compute Units each)
850 GB instance storage (2×420 GB plus 10 GB root partition)
64-bit platform
I/O Performance: High

Extra Large Instance

15 GB memory
8 EC2 Compute Units (4 virtual cores with 2 EC2 Compute Units each)
1,690 GB instance storage (4×420 GB plus 10 GB root partition)
64-bit platform
I/O Performance: High

High-Memory Instances

Instances of this family offer large memory sizes for high throughput applications, including database and memory caching applications.

High-Memory Double Extra Large Instance

34.2 GB of memory
13 EC2 Compute Units (4 virtual cores with 3.25 EC2 Compute Units each)
850 GB of instance storage
64-bit platform
I/O Performance: High

High-Memory Quadruple Extra Large Instance

68.4 GB of memory
26 EC2 Compute Units (8 virtual cores with 3.25 EC2 Compute Units each)
1690 GB of instance storage
64-bit platform
I/O Performance: High

High-CPU Instances

Instances of this family have proportionally more CPU resources than memory (RAM) and are well suited for compute-intensive applications.

High-CPU Medium Instance

1.7 GB of memory
5 EC2 Compute Units (2 virtual cores with 2.5 EC2 Compute Units each)
350 GB of instance storage
32-bit platform
I/O Performance: Moderate

High-CPU Extra Large Instance

7 GB of memory
20 EC2 Compute Units (8 virtual cores with 2.5 EC2 Compute Units each)
1690 GB of instance storage
64-bit platform
I/O Performance: High

But as with Virident’s offering you can get higher performance with high memory addressing if you are MySql or memcached, resulting in a higher performance per watt which should translate into a higher performance per dollar.

GreenCloud Server for MySQL

The GreenCloud Server for MySQL delivers extreme performance improvement over industry standard servers using disk arrays or SSDs, including high-performance PCIe SSDs, on Web 2.0 workloads. Virident optimized versions of MyISAM and InnoDB storage engines directly access datasets stored in the storage class memory tier to eliminate I/O bottlenecks. GreenCloud servers sustain significantly higher query rates, dramatically lower the cost of scaling to larger datasets, and simplify the replication and sharding processes usually employed for scaling. The extreme performance additionally makes it possible to obtain new insights into data and deliver new services by running complex operations such as multi-table joins, which are beyond the reach of traditional servers.

50-70x performance versus Industry Standard Servers with hybrid disk/DRAM configuration on third party benchmarks.

5 -7x versus fastest PCIe-based SSD systems.

Binary compatible with existing InnoDB and MyISAM databases.

30-35x Improvement in QPS/Watt.

10-15x Improvement in QPS/$.

GreenCloud Server for Memcached

The Virident GreenCloud Server for Memcached delivers a new standard of high-performance and cache size scaling for the popular distributed caching application. These servers can deliver 250K object gets per second with low and predictable latencies and support caches with up to 3 billion objects, increasing performance by up to 4x and the available cache memory by up to 8x versus industry standard servers. These performance and scaling benefits permit larger key spaces to be supported by a single server and decrease cache miss rates thereby reducing load on backend database servers.

Industry–leading performance
▫ Up to 250K object gets per second w/ average size of 200-300 bytes
▫ Supports a larger object cache – up to 3 Billion objects

Higher cache hit rates due to larger caches – up to 8x versus industry standard servers
▫ Lower the backend database load up to 50%

50-70% decrease in TCO
▫ GreenCloud servers can replace 4 or more traditional servers in a sever consolidation project

I would expect AWS is evaluating this, and it will be here by the summer.

Nielsen says XBox 360 most used console, maybe online experience requiring lots of data center resources is a factor

January 6, 2010 Dave Ohara

Our family has an XBox 360 and a Wii, and I would agree with the cnet news article.

Xbox 360 is most-used game console, Nielsen says

by Don Reisinger

The Wii is not the most-used console, but it has attracted female gamers.

(Credit: Nintendo)

As the game console wars rage on, new findings from Nielsen may give Xbox 360 fans a little more fodder for their bragging rights.

According to the market researcher, Microsoft's Xbox 360 is the most-used console when measured by its share of total usage minutes, capturing 23.1 percent of gaming time. It is followed by the PlayStation 2 with 20.4 percent of usage time and the Nintendo Wii with 19 percent. Surprisingly, the PlayStation 3 didn't make the list top-three list.

My kids actually play games on their iPod Touches more than the Wii, and the cost is significantly less for games on the iPod, so I am not complaining.

One of the comments on the cnet article made me think of data centers.

This comes as no real surprise to me. I'm a PC gamer primarily but I also own a Wii. It sits in the corner and gathers dust. The superior online features of the 360 keep people coming back. Much more than can be said about the Wii's garbage online multiplayer.

I have a friend who works in XBox 360 online operations, and I don’t think the Nintendo or Sony data center operations team come close to the scale of XBox 360. I don’t recall running into any big news on Nintendo or Sony’s data centers so it is hard to find, and in general data center operations for Nintendo and Sony is probably an overhead as opposed to a revenue stream for XBox 360.

ex-Intel engineers at Microsoft share processor secrets, optimize performance per watt

December 15, 2009 Dave Ohara

Microsoft’s Dileep Bhandarkar and Kushagra Vaid published a paper on Rightsizing Servers for cost and power savings which are important in a green data center strategy. To put things in context both Dileep and Kushagra are ex-Intel processor engineers. Let’s start with the summary from their paper

In conclusion, the first point to emphasize is that there is more to performance than just speed. When your definition of performance includes cost effectiveness, you also need to consider power. The next point is that in many cases processor speed has outpaced our ability to consume it. It’s difficult to exploit CPU performance across the board. This platform imbalance presents an opportunity to rightsize your configurations. The results will offer a reduction in both power and costs, with power becoming an increasingly important factor in the focus on total cost of ownership.

It is also important to remember that industry benchmarks may not reflect your environment. We strongly recommend that IT departments do their own workload characterization, understand the behavior of the applications in their own world, and then optimize for that.

Dileep and Kushagra are going out on a limb sharing details most wouldn’t. Intel and server manufacturers goal is to maximize revenue per unit (chips or servers). If you buy high performance chips in the belief you are buying high performance per watt systems, then they’ll make more money. But, the truth is many times you don’t need the high performance processors. There are many server manufacturers who are selling to big data center companies high performance per watt systems that have low cost processors.

Dileep has a blog post that goes along with the paper.

Before I came to Microsoft to manage server definition and purchases I worked on the other side of the fence. For 17 years I focused on processor architecture and performance at Digital Equipment Corporation, and then worked for 12 years at Intel, focusing on performance, architecture, and strategic planning. It’s interesting how now that I’m a hardware customer, the word “performance” encompasses cost effectiveness almost as much as it does throughput and response time. As my colleague Kushagra Vaid and I point out in our paper, when you look up performance in the dictionary it is defined as “how well something performs the functions for which it’s intended”.

Why should you read this paper? Because as Dileep points out the vast majority of people are purchasing based on unrealistic configurations run under processor benchmarks.

Figure: Three-year total cost of ownership of a basic 1U server

It also surprises me that so many IT groups base their purchasing decisions on published benchmark data about processors, even though that data is often generated using system configurations that are completely unrealistic when compared to real-world environments. Most folks sit up and take note when I display the facts about these topics, because the subject is important.

Rightsizing can clearly reduce the purchase price and the power consumption of a server. But the benefits go beyond the savings in capital expenditure. The lower power consumption has a big impact on the Total Cost of Ownership as shown in the Figure.

So, let’s start diving into the secrets in Dileep and Kushagra’s paper. Here is the background.

Introduction
How do you make sure that the servers you purchase and deploy are most efficient in terms of cost and energy? In the Microsoft Global Foundation Services organization (GFS)—which builds and manages the company’s datacenters that house tens of thousands of servers—we do this by first performing detailed analysis of our internal workloads. Then by implementing a formal analysis process to rightsize the servers we deploy an immediate and long term cost savings can be realized. GFS finds that testing on actual internal workloads leads to much more useful comparison data versus published benchmark data. In rightsizing our servers we balance systems to achieve substantial savings. Our analysis and experience shows that it usually makes more sense to use fewer and less expensive processors because the bottleneck in performance is almost invariably the disk I/O portion of the platform, not the CPU.

What benchmarks? SPEC CPU2006. Understand the conditions of the test.

One of the most commonly used benchmarks is SPEC CPU2006. It provides valuable insight into performance characteristics for different microprocessors central processing units (CPUs) running a standardized set of single-threaded integer and floating-point benchmarks. A multi-threaded version of the benchmark is CPU2006_rate, which provides insight into throughput characteristics using multiple running instances of the CPU2006 benchmark.

But important caveats need to be considered when interpreting the data provided by the CPU2006 benchmark suite. Published benchmark results are almost always obtained using very highly tuned compilers that are rarely if ever used in code development for production systems. They often include settings for code optimization switches uncommon in most production systems. Also, while the individual benchmarks that make up the CPU2006 suite represent a very useful and diverse set of applications, these are not necessarily representative of the applications running in customer production environments. Additionally, it is very important to consider the specifics of the system setup used for obtaining the benchmarking data (e.g., CPU frequency and cache size, memory capacity, etc.) while interpreting the benchmark results since the setup has an impact on results and needs to be understood before making comparisons for product selection.

and TPC.

Additionally, the system configuration is often highly tuned to ensure there are no performance bottlenecks. This typically means using an extremely high performing storage subsystem to keep up with the CPU subsystem. In fact, it is not uncommon to observe system configurations with 1,000 or more disk drives in the storage subsystem for breakthrough TPC-C or TPC-E results. To illustrate this point, a recent real-world example involves a TPC-C
4 | Rightsizing Servers to Achieve Cost and Power Savings in the Datacenter Published December 2009 result for a dual-processor server platform that has an entry level price a little over $3,000 (Source: http://www.tpc.org). The result from the published benchmark is impressive: more than 600,000 transactions per minute. But the total system cost is over $675,000. That’s not a very realistic configuration for most companies. Most of the expense comes from employing 144 GB of memory and over a thousand disk drives.

Both of these test are in general setup to show the performance of CPUs, but as Dileep and Kushagra say, few systems are used in these configurations. So what do you do? Rightsize the system which usually means don’t buy the high performing CPU. As the CPU is not the bottleneck. Keep in mind these are ex-Intel processor engineers.

CPU is typically not your bottleneck: Balance your systems accordingly
So how should you look at performance in the real world? First you need to consider what the typical user configuration is in your organization. Normally this will be dictated either by the capability or by cost constraints. Typically your memory sizes are smaller than what you see in published benchmarks, and you have a limited amount of disk I/O. This is why CPU utilization throughout the industry is very low: server systems are not well balanced. What can you do about it? One option is to use more memory so there are fewer disk accesses. This adds a bit of cost, but can help you improve performance. The other option—the one GFS likes to use—is to deploy balanced servers so that major platform resources (CPU, memory, disk, and network) are sized correctly.

So, what happens if you don’t rightsize?

If memory or disk bandwidth is under-provisioned for a given application, the CPU will remain idle for a significant amount of time, wasting system power. The problem gets worse with multicore CPUs on the technology roadmap, offering further increases in CPU pipeline processing capabilities. A common technique to mitigate this mismatch is to increase the amount of system memory to reduce the frequency of disk accesses.

The old rule was to buy the highest performing processors i could afford. Why not? Because it wastes money and increases your power costs.

Another aspect to consider is shown in Figure 2 below. If you look at performance as measured by frequency for any given processor, typically there is a non-linear effect. At the higher frequency range, the price goes up faster than the frequency. To make matters worse, performance does not typically scale linearly with frequency. If you’re aiming for the highest possible performance, you’re going to end up paying a premium that’s out of proportion with the performance you’re going to get. Do you really need that performance, and is the rest of your system really going to be able to use it? It’s very important from a cost perspective to find the sweet spot you’re after.

What is the relationship of system performance, CPU utilization and disks?

See Figure 5 on the next page shows CPU utilization increasing with disk count as the result of the system being disk limited. As you increase the number of disk drives, the number of transactions per second goes up because you’re getting more I/O and consequently more throughput. With only eight drives CPU utilization is just 5 percent. At 24 drives CPU utilization goes up to 20 percent. If you double the drives even more, utilization goes up to about 25 percent. What that says is that you’re disk I/O limited, so you don’t need to buy the most expensive, fastest processor. This kind of data allows us to rightsize the configuration, reducing both power and cost.

The paper goes on to discuss Web Servers where if content is cached a faster processor does help.

To share the blame, two RAID controllers are looked at one with 256 MB and another with 512MB of cache.

But when we looked at the results from our ETW workload analysis, we found that most of the time our queue depth never goes beyond 8 I/Os. So in our operational area, there is no difference in performance between the two RAID controllers. If we didn’t have the workload analysis and just looked at those curves, we might have been impressed by the 10-15 percent performance improvement at the high end of the scale, and paid a premium for performance we would never have used.

Container Data Center form a silo cylinder or shipping container box

December 10, 2009 Dave Ohara

Clumeq has a new super computer repurposing their decommissioned Van de Graff particle accelerator.

The Quebec site is on the campus of Université Laval inside a renovated van de Graaf silo, with an innovative cylindrical layout for the data center. This cluster will feature upwards of 12,000 processing elements. Compute racks will be distributed among three floors of concentrical rings with a total surface area of 2,700 sq.ft. with an IT capacity of approximately 600 kW.

DataCenterKnowledge picked up the news.

Wild New Design: Data Center in A Silo
December 10th, 2009 : Rich Miller

A diagram of the design of the CLUMEQ Colossus supercomputer, from a recent presentation by Marc Parizeau of CLUMEQ.

Here’s one of the most unusual data center designs we’ve seen. The CLUMEQsupercomputing center in Quebec has worked with Sun Microsystems to transform a huge silo into a data center. The cylindrical silo, which is 65 feet high and 36 feet wide with two-foot thick concrete walls, previously housed a Van de Graaf particle accelerator. When the accelerator was decommissioned, CLUMEQ decided to convert the facility into a high-performance computing (HPC) cluster known as Colossus.

Here is the youtube video.

This idea may seem strange, but it is part of connecting the building to IT equipment. Microsoft just did this showing their Windows Azure Containers with the cooling system integrated in the container.

Sun has their own page on Clumeq.

When supercomputing consortium CLUMEQ designed its high-performance computing (HPC) system in Quebec, it was able to house it in the silo of a former particle accelerator on the Université Laval campus. The structure's 3-level cylindrical floor plan was ideal for cooling the 56 standard-size racks, and enabled the university to retain a treasured landmark.

Background

CLUMEQ is a supercomputing consortium of universities in the province of Quebec, Canada. It includes McGill University, Université Laval, and all nine components of the Université du Québec network. CLUMEQ supports scientific research in disciplines such as climate and ecosystems modeling, high energy particle physics, cosmology, nanomaterials, supramolecular modeling, bioinformatics, biophotonics, fluid dynamics, data mining and intelligent systems.

Univ of Illinois NCSA facility drops UPS for energy efficiency and cost savings, bldg cost $3 mil per mW

December 7, 2009 Dave Ohara

Below is a lot of different parts in what Univ of Illinois’s NCSA facility is building to host the IBM Blue Waters Super Computer. I’ve seen lots of people talk about energy efficiency and cost savings. But, the things that got my attention is the fact is this facility dropped the UPS feature and it is built for $3mil per mW for a 24 mW facility.

How can this be done? I think a key contributor is IBM’s computer architects were involved to help make sure the building was designed to Blue Waters needs.

Maybe one of these days I can visit the facility in ~~Chicago~~Urbana-Champaign, but I can learn a lot just from the knowing where to look for information on the web.

Cnet news has an article IBM’s Blue Water super computer at University of Illinois National Center for Supercomputing (NCSA). But this article doesn’t have much details about the building. I’ve had a few discussions with IBM’s supercomputing folks and I knew they have put a lot of work into the buildings, but it can be sometimes hard to get the information. The good thing is given the project is run by Univ of Illinois there is public information you can get to like here.

William Kramer, Deputy Project Director, Blue Waters

By William Kramer
Deputy Project Director, Blue Waters

The computational science and engineering community requires five attributes from the systems they use and the facilities that provide those systems. These attributes deliver systems that efficiently and productively enhance the scientists' ability to achieve novel results. They are performance, effectiveness, reliability, consistency, and usability (which I refer to as the PERCU method). This is a holistic, user-based approach to developing and assessing computing systems, in particular HPC systems. The method enables organizations to use flexible metrics to assess the features and functions of HPC systems and, if they choose to purchase systems, assess them against the requirements negotiated with the vendor.

Here is a video of the raised floor above being built out.

But wanting more details I dug around for details about the site. Here are details about the site. Note the last paragraph. No UPS.

Energy efficiency is an integral part of the Blue Waters project and the Petascale Computing Facility. The facility will:

Achieve LEED Silver certification, with LEED Gold as the goal.

Rely heavily on more efficient water cooling for the systems it houses.

Take advantage of an on-site tower to chill water for cooling the compute systems. This will reduce energy consumption by using the outside air to chill water during the cold winter months.

Take advantage of the campus' highly reliable electricity supply, avoiding the need for the standard back-up Uninterruptible Power Supply (UPS). Eliminating the UPS saves equipment costs, minimizes floor space used, and increases energy efficiency because systems that employ a UPS convert AC to DC and back, incurring substantial energy losses.

Also, Blue Water uses water directly to the IT equipment.

And how does IBM keep this dense collection of ultrafast processors cool? In a word, water. "We actually went a bit further environmentally," said Ed Seminaro, an IBM Fellow who is involved with the University of Illinois project. "We took a lot of the infrastructure that's typically inside of the computer room for cooling and powering and moved the equivalent of that infrastructure right into that same cabinet with the server, storage, and interconnect hardware."

Seminaro continued: "The whole rack is water-cooled. We actually water-cool the processor directly to pull the heat out. We take it right to water, which is very power efficient," he said.

John Melchi in the video below discusses the building and how it was designed to have efficient power and cooling systems. Here is a transcript of his conversation.

One of the things you don’t think about when you look at a facility like this is the fact that the computer architect has been involved in the design of the building. So IBM has just been a tremendous partner and collaborator in helping Illinois and NCSA ensure that the Petascale Computing Facility will meet the needs of Blue Waters.

Specifically, we’ve made sure there’s enough space, power, and cooling. At the level of Blue Waters, you’re talking about substantial amounts of infrastructure to make a computer and a project like this work.

From the beginning the U of I and NCSA intended to build a data center that was a multi‐use facility. We have the ability to provide 5,400 tons of chilled water to the building. We have 24 megawatts of power coming in. That’s substantially more than the Blue Waters system is going to need. So we’re very well positioned to bring in new air‐cooled systems to the Petascale Computing Facility that will enable U of I researchers and researchers across the country to do their science.

But not just not the building is changed to accommodate Blue Water. the applications are as well.

The Blue Waters staff is now working with about 20 large science teams to start revising their application codes to take full advantage of the Blue Waters features. Much of the work will enable codes to run well and at large scale on Blue Waters, but the work can also be applied to other systems in the future. We are doing this with simulation of the machine itself, application and system performance modeling with premier modeling groups, and early access to prototype systems and software. Over time, we will engage with other science areas as they are allocated time on Blue Waters.

CNET news’s article.

IBM: Envisioning the world's fastest supercomputer

IBM will release a radical new chip next year that will go into a University of Illinois supercomputer in a quest to build what may become the world's fastest supercomputer.

That university's supercomputer center is a storied place, home to both famous fictional and real supercomputers. The notorious HAL 9000 sentient supercomputer in "2001: A Space Odyssey" was built in Urbana, Illinois, presumably on the University of Illinois Urbana-Champaign campus.

The Power7 chip die.

(Credit: IBM)

Though not aspiring to artificial intelligence, the IBM Blue Waters project supercomputer, like the HAL 9000 series, will be able to do massively complex calculations in an instant and, like HAL, be built in Urbana-Champaign. It is being housed in a special building on the Urbana-Champaign campus specifically for the computer that will theoretically be capable of achieving 10 petaflops, about 10 times as fast as the fastest supercomputer today. (A petaflop is 1 quadrillion floating point operations per second, a key indicator of supercomputer performance.)

Part of the National Center for Supercomputing Applications (NCSA) at the University of Illinois, it will be the largest publicly accessible supercomputer in the world when it's turned on sometime in 2011.

The data center for this will look like this

Artist rendering of University of Illinois center that will house IBM's Blue Waters supercomputer

(Credit: University of Illinois)