Data Center Capacity Infrastructure Pattern

The concept of software patterns is well established.  

In software engineering, a design pattern is a general reusable solution to a commonly occurring problem within a given context in software design. A design pattern is not a finished design that can be transformed directly into source or machine code. It is a description or template for how to solve a problem that can be used in many different situations. Patterns are formalized best practices that the programmer must implement themselves in the application.

Over 9 years ago I tried to figure out infrastructure patterns in the same way that software patterns are used.  About 2-3 years ago, I finally understood how to develop infrastructure patterns, and it has taken me the additional 2-3 years to test some of the ideas to get to point of writing them up.

The following is going to be a riff of ideas, and I'll most likely clean it up with the help of some of my friends who are good at writing up patterns, so read the following as a rough draft.

I was talking to a software guy who now works in a data center deploying some of the more complex IT equipment.  His background is software so he gets patterns. We had a brief conversation the other day and I explained the following pattern of site design and capacity.  One of the most important things in defining a pattern is to identify what problem you want to solve.

The problem I am going to discuss is how to add data center capacity in a region like a city.

The typical method is to identify the current use.  Let's say there is a need for 100kW in a facility in a city.  The team who acquires capacity knows how difficult it can be to add space and add to an existing cage, so they decide to quadruple the requirement and look for 400kW.  To start they'll use 25% of the capacity and grow into it over a 10 year period.  They set up the lease to have one fee for reserving the capacity and another set of fees for actual use.

The flaw in using this method is there is an assumption that the space needs to contiguous in one cage area and it is a requirement to have contiguous space.  Logical from a real estate perspective.

Proposed method:  Pick a unit of power that is the most cost effective in a facility given the power infrastructure.  Let's say 140kW.  Enough to handle the 100kW requirement and 40% head room.  Fear is the business could rapidly need more space.  The key to picking this first space is it should have high connectivity to other spaces in the building (not necessarily adjacent) and other buildings that can support the growth of the company. As the business out grows the original 140kW, the data center group has identified other candidates for space to add for growth.  The strategy is to have at first two spaces that are on different power, cooling and network infrastructure, then continue to add more in a mesh of 3-5 sites.  The trade-off of adding smaller units of expansion that can be fully loaded and optimized forces an isolation of compute that can be useful.

For example,  by the time you get to the 4th unit it is highly likely the 1st unit is in need for a hardware refresh across most of the IT gear.  As you power up the 4th unit, you can be working on decommissioning the 1st site, complete replacing the gear to support the future growth.  If you had one contiguous space it is highly likely the 1st deployments are so intertwined with the next 3 years of deployments, the upgrade process is extremely complex.  If each unit of expansion is meant to be isolating in a mesh, then the dependencies are reduced and easier to take offline.

Issues: it is over simplistic to treat data centers as if it is office space that needs to be in one building and adjacent floors.  Can you imagine if the corporate real estate group let the office groups be on the 3rd fl, 8 fl, and 15 fl of a building, and the other team in another building 1/2 mile away?  But, guess what with the right network infrastructure bits going from floor to floor, or to another building is not an issue.  

Examples;  When you look at Google, Facebook, and Microsoft's data centers they build additional buildings to add capacity to a site.  They did not build a building 4 times bigger than what they needed and grow into it over 10 years.  Modular data centers by Dell, HP, and Compass data centers allow those who feel they need to have buildings to use this same approach.  Once you jump off the rack top of rack switch it can make little difference whether you are going 5 ft, 500 ft, 5,000 ft, or 50,000 ft.