Mike Manos’s presentation at 2009 Gartner Data Center Conference

Gartner’s Data Center Conference is coming up and I am building my agenda.  Mike Manos’s presentation will be an interesting one to watch.

Regulation. It's Real. It's Coming. It's Expensive.

Wednesday, 02 December 2009
01:45 PM-02:45 PM

Speaker: Mike Manos
Location: Octavius 2
Session Type: Solution Provider Session

Energy regulation is coming. The US House of Representatives has already passed its Cap and Trade legislation and the Senate has a bill in committee. In Europe it already exists. The operational and cost impact on datacenters in the today's regulatory environment is substantial. In this presentation Mr. Manos will provide a detailed overview of the pending industry-impacting legislation and what you will need to do to negate its impact.

I’ve seen Mike present many times and it is always entertaining.  But, what I am most interested in is the crowd that attends Mike’s session and whether they get it.

Read more

Google positions itself #1 in Green Data Centers, hosts Secretary of Energy

cnet news has a post on US Secretary of Energy Steven Chu with Google CEO Eric Schmidt.

Google's warm reception for secretary of energy

by Tom Krazit

Google CEO Eric Schmidt (left) and U.S. Secretary of Energy Steven Chu at Google headquarters Monday.

(Credit: James Martin/CNET)

MOUNTAIN VIEW, Calif.--For a bunch of search engineers, Google employees care an awful lot about energy and the environment.

Google hosted an event for employees Monday featuring Steven Chu, the U.S. secretary of energy under President Obama and a man Chief Executive Eric Schmidt said "may become one of the most influential scientists of our generation, if he isn't already." Chu took about an hour to speak to a packed room of Google employees followinghis announcement of $151 million in funding for new energy-related projects as part of the ARPA-E program.

Part of the format has Schmidt interviewing Chu.

Schmidt, who serves as an adviser to the administration on President Obama's Council of Advisers on Science and Technology, asked Chu what it's like being the senior scientist in the government. He's actually the first scientist to hold the secretary of energy position, and won the Nobel Prize in Physics in 1997.

"It's funny in a macabre sort of way. I don't think Congress treats me like your average cabinet member," Chu said with a wry chuckle. He said he's spent much of his first year on the job talking to Congress about the problems with energy use and the environment, and that legislators are receptive, for the most part.

"I think the president has made it very clear that science plays such an integral role in the decisions we have to make," Chu said. He was preaching to the choir at the Googleplex.

On a regular basis I hear Green IT is a fad and not important.  Google has done a great job of providing a way for its staff to work together to use less energy for Google services. 

What those people who think Green IT is a fad miss is having your staff focus on making things greener, means you have benchmarked your performance. And continually evaluate new ways to reduce energy consumption and reduce your carbon footprint.  This saves money over the long haul and makes it easier to provide new services.

The winners in internet services are going to go to those who have the highest performance per watt.  Google is in a race and many think the race isn’t worth the effort.  Amazon gets it. Who else?

I bet you Eric Schmidt is helping the federal gov’t understand how much more efficient it would be to host services in the Google cloud vs federal data centers.

Can Google be the lowest cost utility for data center services?  Who is competing with Google to be the lowest cost?  The lowest cost provider will be the most efficient using energy.

Being the greenest is another way to say you are the lowest cost provider of IT services.

Still think Green IT will be a fad?

Read more

Simplicity and the Data Center, a path to a Happier Data Center?

One area I use to gauge how good a data center designer is whether they talk about simplicity in the data center.  I can think of people at Google and Microsoft who regularly use simplicity as a design goal.  And, there are many others.  Why simplicity is important is articulated well in this post by Matthieu Ricard who discusses simplicity as applied to an approach to life, and for many companies data centers are their life.  If data centers suffer, then the company suffers.

In praise of simplicity

Friday 27 March 2009

« Simplify, simplify, simplify… » These refreshing words written by Henry Thoreau remind us that much of our suffering comes from adding unnecessary and disturbing complications in our lives. We seem to be continually weaving elaborate conceptual webs around even straightforward events. We distort reality and shroud it with complications by superimposing fabricated mental constructs. This distortion invariably leads to mental states and behaviors that undermine our inner peace and that of others.

How many human enterprises and noble causes have failed due to such unnecessary complications! We need to simplify our thoughts, simplify our words, and simplify our actions. We need to avoid falling into circular mental rumination, pointless chatter, and vain activities that waste our precious time and engender all kinds of dysfunctional situations.

Having a simple mind is not the same as being simple-minded. Simplicity of mind is reflected in lucidity, inner strength, buoyancy, and a healthy contentment that withstands the tribulations of life with a light heart. Simplicity reveals the nature of the mind behind the veil of restless thoughts. It reduces the exacerbated feeling of self-importance and opens our heart to genuine altruism.

Who is Matthieu Ricard?  A really smart guy who got his Ph.D. degree in cell genetics at the renowned Institut Pasteur under the Nobel Laureate Francois Jacob, but figured out he wanted to do more with his life and decided to be a buddhist monk, so he spends a lot of time thinking about ways to live a happier life.  And, maybe there are things to learn from him on how there could be better data centers.

Since 1989, Matthieu has served as the French interpreter for the Dalai Lama. He is a board member of the Mind and Life Institute, an organization dedicated to collaborative research between scientists and Buddhist scholars and meditators. He is engaged in the research on the effect of mind training and meditation on the brain at various universities in the USA (Madison, Princeton, and Berkeley), Europe (Zurich) and Hong Kong.

For an entertaining talk watch this video.  One of the funny parts is when he makes fun of fellow French and intellectuals at time mark 1:40.

He has figured out how to be the happiest person in the world.

He has been dubbed the "happiest person in the world" by scientists.[2] Matthieu Ricard was a volunteer subject in the University of Wisconsin–Madison's testing of happiness, scoring -0.45 which was off the scale compared to hundreds of other volunteers, where scores ranged between +0.3 indicating depression and -0.3 denoting great happiness.[3]

Another way to interpret the need for simplicity is the desire for cloud computing.  This post by Joe McKendrick on ZDNET references material written for Database Trends and Applications.

Paradox 5: Complexity Increases Simplicity. “There is pressure on data centers to provide more services, scalability and availability than ever before. That’s why cloud computing approaches are gaining in popularity—companies can ramp up capabilities by hiding away the complexity. “We do not see the concept of the data center disappearing, instead, we see the concept of data centers becoming more amorphous,” says Martin Schneider, director of product marketing at SugarCRM. “The emerging trend of cloud computing kind of ties all of the major trends around data centers, in that it enables companies to run far simpler data centers, if not obviating the need for them in some instances.”

Do you think of simplicity in your data center design?  Or are you one of those who believes adding another feature will solve your data center problems?

Maybe we need a happiness metric for data centers?  I bet there are plenty of data centers we could add to the list of suffering data centers.  How many are happy?

Read more

OpenSolaris Green Home Server – low power and small

Sun employee Constantin Gonzalez Schmitz has post on his technical decisions for a Green OpenSolaris Home server. His requirements for ECC memory and power efficient make sense to have a reliable low power server.

A Small and Energy-Efficient OpenSolaris Home Server

In an earlier entry, I outlined my most important requirements for an optimal OpenSolaris Home Server. It should:

  1. Run OpenSolaris in order to fully leverage ZFS,
  2. Support ECC memory, so data is protected at all times,
  3. Be power-efficient, to help the environment and control costs,
  4. Use a moderate amount of space and be quiet, for some extra WAF points.

He admits his wife works for AMD, but qualifies his decision for AMD processor based on price, performance, and energy efficiency.

Disclosure: My wife works for AMD, so I may be slightly biased. But I think the following points are still very valid.

AMD on the other hand has a number of attractive points for the home server builder:

  • AMD consumer CPUs use the same microarchitecture than their professional CPUs (currently, it's the K10 design). They only vary by number of cores, cache size, number of HT channels, TDP and frequency, which are all results of the manufacturing process. All other microarchitecture features are the same. When using an AMD consumer CPU, you essentially get a "smaller brother" of their high end CPUs.
  • This means you'll also get a built-in memory-controller that supports ECC.
  • This also means less chips to build a system (no Northbridge needed) and thus lower power-consumption.
  • AMD has been using the HyperTransport Interconnect for quite a while now. This is a fast, scaleable interconnect technology that has been on the market for quite a while so chipsets are widely available, proven and low-cost.

So it was no suprise that even low-cost AMD motherboards at EUR 60 or below are perfectly capable of supporting ECC memory which gives you an important server feature at economic cost.

My platform conclusion: Due to ECC support, low power consumption and good HyperTransport performance at low cost, AMD is an excellent platform for building a home server.

To keep things small he uses 2.5” drives.

While looking for alternatives, I found a nice solution: The Scythe Slot Rafter fits into an unused PCI slot (taking up the breadth of two) and provides space for mounting four 2.5" disks at just EUR 5. These disks are cheap, good enough and I had an unused one lying around anyway, so that was a perfect solution for me.

And, being concerned about reliability adds a 2nd NIC.

Extra NIC: The Asus M3A78-CM comes with a Realtek NIC and some people complained about driver issues with OpenSolaris. So I followed the advice on the aforementioned Email thread and bought an Intel NIC which is well supported, just in case.

Constantin was able to achieve a 45W idle power consumption.

The Result

And now for the most important part: How much power does the system consume? I did some testing with one boot disk and 4GB of ECC RAM and measured about 45W idle. While stressing CPU cores, RAM and the disk with multiple instances of sysbench, I could not get the system to consume more than 80W. All in all, I'm very pleased with the numbers, which are about half of what my old system used to consume. I didn't do any detailed performance tests yet, but I can say that the system feels very responsive and compile runs just rush along the screen. CPU temperature won't go beyond the low 50Cs on a hot day, despite using the lowest fan speed, so cooling seems to work well, too.

It will be interesting to see what follow up posts Constantin writes.

Read more

Google’s Secret to efficient Data Center design – ability to predict performance

DataCenterKnowledge has a post on Google’s (Public, NASDAQ:GOOG) future envisioning 10 million servers.

Google Envisions 10 Million Servers

October 20th, 2009 : Rich Miller

Google never says how many servers are running in its data centers. But a recent presentation by a Google engineer shows that the company is preparing to manage as many as 10 million servers in the future.

Google’s Jeff Dean was one of the keynote speakers at an ACM workshop on large-scale computing systems, and discussed some of the technical details of the company’s mighty infrastructure, which is spread across dozens of data centers around the world.

In his presentation (link via James Hamilton), Dean also discussed a new storage and computation system called Spanner, which will seek to automate management of Google services across multiple data centers. That includes automated allocation of resources across “entire fleets of machines.”

Going to Jeff Dean’s presentation, I found a Google secret.

image

Designs, Lessons and Advice from Building Large
Distributed Systems

Designing Efficient Systems
Given a basic problem definition, how do you choose the "best" solution?
• Best could be simplest, highest performance, easiest to extend, etc.
Important skill: ability to estimate performance of a system design
– without actually having to build it!

What is Google’s assumption of where computing is going?

image

Thinking like an information factory Google describes the machinery as servers, racks, and clusters.  This approach supports the idea of information production.  Google introduces the idea of data centers being like a computer, but I find a more accurate analogy is to think of data centers as information factories.  IT equipment are the machines in the factory, consuming large amounts of electricity for power and cooling the IT load.

 image

Located in a data center like Dalles, OR

image

With all that equipment things must break.  And, yes they do.

Reliability & Availability
• Things will crash. Deal with it!
– Assume you could start with super reliable servers (MTBF of 30 years)
– Build computing system with 10 thousand of those
– Watch one fail per day
• Fault-tolerant software is inevitable
• Typical yearly flakiness metrics
– 1-5% of your disk drives will die
– Servers will crash at least twice (2-4% failure rate)

The Joys of Real Hardware
Typical first year for a new cluster:
~0.5 overheating (power down most machines in <5 mins, ~1-2 days to recover)
~1 PDU failure (~500-1000 machines suddenly disappear, ~6 hours to come back)
~1 rack-move (plenty of warning, ~500-1000 machines powered down, ~6 hours)
~1 network rewiring (rolling ~5% of machines down over 2-day span)
~20 rack failures (40-80 machines instantly disappear, 1-6 hours to get back)
~5 racks go wonky (40-80 machines see 50% packetloss)
~8 network maintenances (4 might cause ~30-minute random connectivity losses)
~12 router reloads (takes out DNS and external vips for a couple minutes)
~3 router failures (have to immediately pull traffic for an hour)
~dozens of minor 30-second blips for dns
~1000 individual machine failures
~thousands of hard drive failures
slow disks, bad memory, misconfigured machines, flaky machines, etc.
Long distance links: wild dogs, sharks, dead horses, drunken hunters, etc.

image

Monitoring is how you know your estimates are correct.

Add Sufficient Monitoring/Status/Debugging Hooks
All our servers:
• Export HTML-based status pages for easy diagnosis
• Export a collection of key-value pairs via a standard interface
– monitoring systems periodically collect this from running servers
• RPC subsystem collects sample of all requests, all error requests, all
requests >0.0s, >0.05s, >0.1s, >0.5s, >1s, etc.
• Support low-overhead online profiling
– cpu profiling
– memory profiling
– lock contention profiling
If your system is slow or misbehaving, can you figure out why?

Many people have quoted the idea “you can’t manage what you don’t measure.”  But a more advanced concept that Google discusses is “If you don’t know what’s going on, you can’t do
decent back-of-the-envelope calculations!”

Know Your Basic Building Blocks
Core language libraries, basic data structures,
protocol buffers, GFS, BigTable,
indexing systems, MySQL, MapReduce, …
Not just their interfaces, but understand their
implementations (at least at a high level)
If you don’t know what’s going on, you can’t do
decent back-of-the-envelope calculations!

This ideas being discussed are by a software architect, but the idea applies just as much to data center design.  And, the benefit Google has it has all of IT and development thinking this way.

image

And here is another secret to great design.  Say No to features.  But what the data center design industry wants to do is to get you to say yes to everything, because it makes the data center building more expensive increasing profits.

image

So what is the big design problem Google is working on?

image

Jeff Dean did a great job of putting a lot of good ideas in his presentation, and it was nice Google let him present some secrets we could all learn from.

Read more