Google’s Secret to efficient Data Center design – ability to predict performance

DataCenterKnowledge has a post on Google’s (Public, NASDAQ:GOOG) future envisioning 10 million servers.

Google Envisions 10 Million Servers

October 20th, 2009 : Rich Miller

Google never says how many servers are running in its data centers. But a recent presentation by a Google engineer shows that the company is preparing to manage as many as 10 million servers in the future.

Google’s Jeff Dean was one of the keynote speakers at an ACM workshop on large-scale computing systems, and discussed some of the technical details of the company’s mighty infrastructure, which is spread across dozens of data centers around the world.

In his presentation (link via James Hamilton), Dean also discussed a new storage and computation system called Spanner, which will seek to automate management of Google services across multiple data centers. That includes automated allocation of resources across “entire fleets of machines.”

Going to Jeff Dean’s presentation, I found a Google secret.


Designs, Lessons and Advice from Building Large
Distributed Systems

Designing Efficient Systems
Given a basic problem definition, how do you choose the "best" solution?
• Best could be simplest, highest performance, easiest to extend, etc.
Important skill: ability to estimate performance of a system design
– without actually having to build it!

What is Google’s assumption of where computing is going?


Thinking like an information factory Google describes the machinery as servers, racks, and clusters.  This approach supports the idea of information production.  Google introduces the idea of data centers being like a computer, but I find a more accurate analogy is to think of data centers as information factories.  IT equipment are the machines in the factory, consuming large amounts of electricity for power and cooling the IT load.


Located in a data center like Dalles, OR


With all that equipment things must break.  And, yes they do.

Reliability & Availability
• Things will crash. Deal with it!
– Assume you could start with super reliable servers (MTBF of 30 years)
– Build computing system with 10 thousand of those
– Watch one fail per day
• Fault-tolerant software is inevitable
• Typical yearly flakiness metrics
– 1-5% of your disk drives will die
– Servers will crash at least twice (2-4% failure rate)

The Joys of Real Hardware
Typical first year for a new cluster:
~0.5 overheating (power down most machines in <5 mins, ~1-2 days to recover)
~1 PDU failure (~500-1000 machines suddenly disappear, ~6 hours to come back)
~1 rack-move (plenty of warning, ~500-1000 machines powered down, ~6 hours)
~1 network rewiring (rolling ~5% of machines down over 2-day span)
~20 rack failures (40-80 machines instantly disappear, 1-6 hours to get back)
~5 racks go wonky (40-80 machines see 50% packetloss)
~8 network maintenances (4 might cause ~30-minute random connectivity losses)
~12 router reloads (takes out DNS and external vips for a couple minutes)
~3 router failures (have to immediately pull traffic for an hour)
~dozens of minor 30-second blips for dns
~1000 individual machine failures
~thousands of hard drive failures
slow disks, bad memory, misconfigured machines, flaky machines, etc.
Long distance links: wild dogs, sharks, dead horses, drunken hunters, etc.


Monitoring is how you know your estimates are correct.

Add Sufficient Monitoring/Status/Debugging Hooks
All our servers:
• Export HTML-based status pages for easy diagnosis
• Export a collection of key-value pairs via a standard interface
– monitoring systems periodically collect this from running servers
• RPC subsystem collects sample of all requests, all error requests, all
requests >0.0s, >0.05s, >0.1s, >0.5s, >1s, etc.
• Support low-overhead online profiling
– cpu profiling
– memory profiling
– lock contention profiling
If your system is slow or misbehaving, can you figure out why?

Many people have quoted the idea “you can’t manage what you don’t measure.”  But a more advanced concept that Google discusses is “If you don’t know what’s going on, you can’t do
decent back-of-the-envelope calculations!”

Know Your Basic Building Blocks
Core language libraries, basic data structures,
protocol buffers, GFS, BigTable,
indexing systems, MySQL, MapReduce, …
Not just their interfaces, but understand their
implementations (at least at a high level)
If you don’t know what’s going on, you can’t do
decent back-of-the-envelope calculations!

This ideas being discussed are by a software architect, but the idea applies just as much to data center design.  And, the benefit Google has it has all of IT and development thinking this way.


And here is another secret to great design.  Say No to features.  But what the data center design industry wants to do is to get you to say yes to everything, because it makes the data center building more expensive increasing profits.


So what is the big design problem Google is working on?


Jeff Dean did a great job of putting a lot of good ideas in his presentation, and it was nice Google let him present some secrets we could all learn from.