Google's Server Environment is not as homogenous as you think, up to 5 microarchitectures

There is a common belief that Google, Facebook, Twitter and any of the newer Web 2.0 companies have it easier because they have homogeneous environments vs. a typical enterprise.  Well, Google has a paper that discusses how its homogenous Warehouse-scale computers are actually heterogenous and there is opportunity for performance improvements of up to 15%.

In this table Google lists the number of micro architectures in 10 different data centers.  Now Google has 13 WSCs so this could show how old this analysis was run (maybe 2-3 yrs ago.)  Or it could have been more recently and they dropped 3 data centers out of the table.  The 13th just came on line over the past year and would probably not have enough data.  


The issue that is pointed out in the paper is that the job manager assumes the cores are homogenous.


When in fact they are not.


Here is the results summary.

Results Summary: This paper shows that there is a

significant performance opportunity when taking advantage

of emergent heterogeneity in modern WSCs. At the scale of

modern cloud infrastructures such as those used by companies

like Google, Apple, and Microsoft, gaining just 1% of

performance improvement for a single application translates

to millions of dollars saved. In this work, we show that largescale

web-service applications that are sensitive to emergent

heterogeneity improve by more than 80% when employing

Whare-Map over heterogeneity-oblivious mapping. When

evaluating Whare-Map using our testbed composed of key

Google applications running on three types of production

machines commonly found co-existing in the same WSC, we

improve the overall performance of an entire WSC by 18%.

We also find a similar improvement of 15% in our benchmark

testbed and in our analysis of production data from WSCs

hosting live services.

Here are three different microarchitectures used in the paper - Table 3 is production. Table 4 is a test bed.


Here are the range in performance for the three different micro architectures.


The new job scheduler is deployed at Google and here are results.


Figure 11 shows the calculated

performance improvement when using Whare-Map over the

currently deployed mapping in 10 of Google’s active WSCs.

Even though some major applications are already mapped

to their best platforms through manual assignment, we have

measured significant potential improvement of up to 15%

when intelligently placing the remaining jobs. This performance

opportunity calculation based on this paper is now

an integral part of Google’s WSC monitoring infrastructure.

Each day the number of ‘wasted cycles’ due to inefficiently

mapping jobs to the WSC is calculated and reported across

each of Google’s WSCs world wide.

There is more in the paper I need to digest, but I need to finish this post as it is long enough already.