Counting Servers is Easy, there are a lot of other things that are much harder

James Hamilton has a post saying that is hard to count servers.

At the Microsoft World-Wide Partners Conference, Microsoft CEO Steve Ballmer announced that “We have something over a million servers in our data center infrastructure. Google is bigger than we are. Amazon is a little bit smaller. You get Yahoo! and Facebook, and then everybody else is 100,000 units probably or less.

That’s a surprising data point for a variety of reasons. The most surprising is that the data point was released at all. Just about nobody at the top of the server world chooses to boast with the server count data point. Partly because it’s not all that useful a number but mostly because a single data point is open to a lot of misinterpretation by even skilled industry observers. Basically, it’s pretty hard to see the value of talking about server counts and it is very easy to see the many negative implications that follow from such a number

What is hard is figuring out how many cores these servers have.  What is the age of the servers?  Oldest is 4 years.  Or 3.  What is the rate of adding new data center capacity and how does that relate to overall cores and storage increasing?

The one advantage Microsoft has in making a statement on server count is the companies will not speak up what theirs is.

The first question when thinking about this number is where does the comparative data actually come from?  I know for sure that Amazon has never released server count data. Google hasn’t either although estimates of their server footprint abound. Interestingly the estimates of Google server counts 5 years ago was 1,000,000 servers whereas current estimates have them only in the 900k to 1m range.

We'll see if others speak up on server count or not.  

The US census for years has conducted a study of manufacturing capacity for years.


Quarterly Survey of Plant Capacity Utilization (QPC)

The Survey of Plant Capacity Utilization provides statistics on the rates of capacity utilization for the U.S. manufacturing and publishing sectors.

  • The Federal Reserve Board (FRB) and The Department of Defense (DOD) co-fund the survey.
  • The survey collects data on actual, full, and emergency production levels.
  • Data are obtained from manufacturing and publishing establishments by means of a mailed questionnaire.
  • Respondents are asked to report actual production, an estimate of their full production capability, and an estimate of their national emergency production.
  • From these reported values, full and emergency utilization rates are calculated.
  • The survey produces full and emergency utilization rates for the manufacturing and publishing sectors defined by the North American Industrial Classification System (NAICS).
  • Final utilization rates are based on information collected from survey respondents.

Wouldn't it be useful for the FRB and DOD to understand data center capacity and utilization?  It is hard to assess, but that doesn't mean it shouldn't be done.