Google shares its 10-20% Server performance improvement technique, analyzing micro architecture of AMD and Intel Servers

If you told someone in the data center industry you could get 10-20% performance gain, people wouldn't believe you.  If you said you had a new processor, memory, storage, or network architecture, you would have a higher chance of people thinking you tell the truth.  Would you believe someone if they told you at the micro architecture level of servers, if you designed the software to access local memory vs. non-local memory on existing systems you could get a 10-20% performance gain?  Well Google has shared this information and is deploying the solution in its data centers.

 This indicates

that a simple NUMA-aware scheduling can already

yield sizable benefits in production for those platforms.

Based on our findings, NUMA-aware thread mapping is

implemented and in the deployment process in our production

WSCs.

Here is the Google Paper published in 2013.  Warning this is not an easy paper to read if you are not familiar with operating systems and hardware.  But, I hope it gives an appreciation of another way to green a data center by making some changes in software.

Optimizing Google's Warehouse Scale Computers: The NUMA Experience

Abstract: Due to the complexity and the massive scale of modern warehouse scale computers (WSCs), it is challenging to quantify the performance impact of individual microarchitectural properties and the potential optimization benefits in the production environment. As a result of these challenges, there is currently a lack of understanding of the microarchitecture-workload interaction, leaving potentially significant performance on the table.

This paper argues for a two-phase performance analysis methodology for optimizing WSCs that combines both an in-production investigation and an experimental load-testing approach. To demonstrate the effectiveness of this two-phase methodology, and to illustrate the challenges, methodologies, and opportunities in optimizing modern WSCs, this paper investigates the impact of non-uniform memory access (NUMA) for several Google's key web-service workloads in large-scale production WSCs. Leveraging a newly-designed metric and continuous large-scale profiling in live datacenters, our production analysis demonstrates that NUMA has a significant impact (10-20%) on two important webservices: Gmail backend and search frontend. Our carefully designed load-test further reveals surprising tradeoffs between optimizing for NUMA performance and reducing cache contention.

 

 

Ahh, now I get it Google uses Clusters the way others use Containers

Containers work if you want to have a unit of deployment with up to 2,000 servers.  Google used containers early on, but doesn't use them anymore.  Some of the biggest use of Containers is by Microsoft's data center group.  DCD covers Microsoft discussing how containers contain outages.

 “In the electrical and mechanical design of this data center, we considered each container as a discrete failure domain and modeled the availability of power and cooling with the expectation that maintenance events and unplanned outages would occur in the environment,” Gauthier writes. Failures would also be compartmentalized in a standard and predictable way.

I was looking at this presentation of Google's cluster system.  Note how the network and power topology is deployed to support a cluster.

NewImage

Mike Manos and I talked long time ago about how containers encapsulate compute, network, storage, power and cooling, but you can also encapsulate these principles if your data center uses the same principles to support a cluster of functionality.

Google achieves the same containment of power, cooling, compute, storage, and network as Microsoft does in a container, but without the physical container.

What is nice to see is that the SW team knows they have a big role in saving energy.

NewImage

Fixing the method of Triumvirate organization to make it more useful

Last year I was lucky to get some time to chat with RISD's President John Maeda after he spoke at GigaOm Roadmap.  We chatted about typography and his presentation.  Then I shared the idea I am working on with two other business partners.  And how we set up with a company of three executives.  He instantly recognized the structure as a triumvirate.

triumvirate (from Latin, "triumvirātus") is a political regime dominated by three powerful individuals, each a triumvir (pl. triumviri). The arrangement can be formal or informal, and though the three are usually equal on paper, in reality this is rarely the case.

John continued by saying that the beauty of a triumvirate is as long as two agree then you move forward.

One of the more famous triumvirate's now is Google's three executives.

Eric Schmidt, CEO of Google has referred to himself, along with founders Larry Page and Sergey Brin as part of a triumvirate, stating, "This triumvirate has made an informal deal to stick together for at least 20 years"

I then told John we modified the Triumvirate method by requiring unanimous support for a decision to be made, and the company is divided into 1/3 ownership.  John's response, "but doesn't that make you slower."

It may slow things down a bit, but it makes sure that every person is heard for their opinion, and for the overall success we consider the others view.  

An example of the problem being addressed is illustrated by the Minority Report's precogs requiring only two votes to convict someone for a crime.  The two male precogs could ignore the female precog and move forward which made the establishment happy, ignoring the issue that the decision was wrong.

Anderton seeks the advice of Dr. Iris Hineman (Lois Smith), the lead researcher of the PreCrime technology. She explains to Anderton that sometimes the three precogs see different visions of the future, in which case the system only provides data on the two reports which agree; the "minority report", reflecting the potential future where a predicted killer would have done something different, is discarded. According to Dr. Hineman, the female precog Agatha is most likely to be the precog that witnesses the minority report.

After two years of using this modified Triumvirate, we have established a higher of trust and understanding within our partnership.  Sometimes, we debate an issue, and we work together to come up something that works for all. Think of it as a peer review for decisions.  We all want the company to succeed, and even though you are in minority it doesn't mean you are wrong.  The majority may be wrong.  Sometimes are made, then someone says it really doesn't make that much difference to me, I just wanted to bring up an issue.  I trust you guys to make the right decision.

Having three minds think about customers, technology, and other things to run the business is something we have gotten so used to it is hard to think of having a typical hierarchical structure.  Oh yeh, we don't have any backseat drivers from Angels or VC either.  They would upset the balance of power to be equal.  Can you imagine a VC putting his money in and we tell him you get a vote, but your vote is no better than any one else's.

There are many things that we don't need to have a consensus on.  The industry relationships/partnerships and operations is my responsibility.  One guy focuses on the technologies and operations.  Another focuses on analytics, operations, and finance.  We all are concerned about operations which I guess is the glue that pulls everything together and we can measure alternatives against.

I am writing this post to share the idea of a modified triumvirate and maybe one of these days I'll run into another company that uses the same structure, but I am not holding my breath.

The power of two founders is well known.  Apple, Hewlett-Packard, Microsoft, and Google.

BTW, there was an attempt for a third founder at Apple.  Someone to settle the disputes between Wozniak and Jobs.

Apple's lost founder: Jobs, Woz and Wayne

Updated:   07/26/2010 03:59:17 PM PDT
...

He was present at the birth of cool on April Fool's Day, 1976: Co-founder — along with Steve Jobs and Steve Wozniak — of the Apple Computer Inc., Wayne designed the company's original logo, wrote the manual for the Apple I computer, and drafted the fledgling company's partnership agreement.

That agreement gave him a 10 percent ownership stake in Apple, a position that would be worth about $22 billion today if Wayne had held onto it.

...

"It was at that point he said, 'Let's form a company,' " Wayne recalls. Like a quarterback drawing a play in the dirt, Jobs came up with the idea of giving himself and Wozniak each 45 percent, the final 10 percent going to Wayne, who would mediate disputes between his headstrong partners. "That would resolve any problems forever and ever," says Wayne, who drew up the contract on a typewriter. There was no such thing as a word processor yet. They were about to invent it.

One way to view what is important to Google's data center group look at the Google search UI

It is interesting thought experiment to be low level in data centers discussing sites, power, cooling, etc, then sometimes pop your head up and get a big picture view what you can see.  

I was chatting with a Google person last week and I was thanking him for a response to an e-mail I sent a year ago to check on their data center calculations on their web site.  Thanks X number of data centers is what you have and Y number of data centers will come on line within the next 12 months.  He was surprised I knew.  I told him I can count the number of announcements made over the past year.  It's easy to see when you know where to look.

When I was at another data center conference someone asked me what is Google going to do with all that capacity.  I don't know.

Then I saw the below graphic when I was playing around with some ideas of a Triumvirate.

And, then it is was simple to say.  Google is focused in order of priority - web, images, maps, shopping, videos, news, books, blogs, flights, discussions, recipes, applications, and patents.

I have a friend who is starting a company about recipes with plenty of funding.  Seeing the fact that Google is looking at recipes may mean a potential business model for her is be like Waze have a huge following and sell your company for cash to Google.  :-)

NewImage