Lack of Redundancy in Bridge Design causes I-5 Outage

It is amazing how there can single points of failure in data centers even though they were sold as highly available designs.  Some make the mistake that just because it hasn't failed in the past and a lot of money was paid, failure is unlikely.

My dad was a civil engineer with CalTrans (California's Department of Transportation) Bridge Division which includes overpasses (CA has way more overpasses than bridges), so whenever I read about civil engineering stuff it reminds of a possible conversation with my Dad.  Unfortunately, my dad died of colon cancer 19 years ago, so I need imagine the conversations.

In the state of Washington the I-5 has an outage, a bridge has collapsed when a truck's load hit the structure.

 

What is the cause of the bridge collapse, an outage of Interstate I-5 between Seattle and Vancouver, BC?  One hit from a truck and it collapses?  Sounds like a Jenga design.  Knock out this one block and the whole thing falls.

Here is a view from Google maps of what the bridge used to look like before the collapse.

NewImage

The WSJ has an answer to the outage.  A single point of failure.  The lack of redundancy in the design. 

"This is not the sign of deteriorating infrastructure, this is a sign of vulnerable infrastructure," said Abolhassan Astaneh-Asl, a civil-engineering professor at the University of California, Berkeley.

"This original design in those days was fine," he said of bridges lacking redundancy, "but today we should invest in getting these…out of the system."

...

The bridge has what is known as a "fracture-critical" design, which means that if any part fails, the whole bridge could fail, said Mr. Astaneh-Asl. "A fracture critical bridge is like a chain," he said. "Any link in this chain you cut, it's going to fail."

Data Center Capacity Infrastructure Pattern

The concept of software patterns is well established.  

In software engineering, a design pattern is a general reusable solution to a commonly occurring problem within a given context in software design. A design pattern is not a finished design that can be transformed directly into source or machine code. It is a description or template for how to solve a problem that can be used in many different situations. Patterns are formalized best practices that the programmer must implement themselves in the application.

Over 9 years ago I tried to figure out infrastructure patterns in the same way that software patterns are used.  About 2-3 years ago, I finally understood how to develop infrastructure patterns, and it has taken me the additional 2-3 years to test some of the ideas to get to point of writing them up.

The following is going to be a riff of ideas, and I'll most likely clean it up with the help of some of my friends who are good at writing up patterns, so read the following as a rough draft.

I was talking to a software guy who now works in a data center deploying some of the more complex IT equipment.  His background is software so he gets patterns. We had a brief conversation the other day and I explained the following pattern of site design and capacity.  One of the most important things in defining a pattern is to identify what problem you want to solve.

The problem I am going to discuss is how to add data center capacity in a region like a city.

The typical method is to identify the current use.  Let's say there is a need for 100kW in a facility in a city.  The team who acquires capacity knows how difficult it can be to add space and add to an existing cage, so they decide to quadruple the requirement and look for 400kW.  To start they'll use 25% of the capacity and grow into it over a 10 year period.  They set up the lease to have one fee for reserving the capacity and another set of fees for actual use.

The flaw in using this method is there is an assumption that the space needs to contiguous in one cage area and it is a requirement to have contiguous space.  Logical from a real estate perspective.

Proposed method:  Pick a unit of power that is the most cost effective in a facility given the power infrastructure.  Let's say 140kW.  Enough to handle the 100kW requirement and 40% head room.  Fear is the business could rapidly need more space.  The key to picking this first space is it should have high connectivity to other spaces in the building (not necessarily adjacent) and other buildings that can support the growth of the company. As the business out grows the original 140kW, the data center group has identified other candidates for space to add for growth.  The strategy is to have at first two spaces that are on different power, cooling and network infrastructure, then continue to add more in a mesh of 3-5 sites.  The trade-off of adding smaller units of expansion that can be fully loaded and optimized forces an isolation of compute that can be useful.

For example,  by the time you get to the 4th unit it is highly likely the 1st unit is in need for a hardware refresh across most of the IT gear.  As you power up the 4th unit, you can be working on decommissioning the 1st site, complete replacing the gear to support the future growth.  If you had one contiguous space it is highly likely the 1st deployments are so intertwined with the next 3 years of deployments, the upgrade process is extremely complex.  If each unit of expansion is meant to be isolating in a mesh, then the dependencies are reduced and easier to take offline.

Issues: it is over simplistic to treat data centers as if it is office space that needs to be in one building and adjacent floors.  Can you imagine if the corporate real estate group let the office groups be on the 3rd fl, 8 fl, and 15 fl of a building, and the other team in another building 1/2 mile away?  But, guess what with the right network infrastructure bits going from floor to floor, or to another building is not an issue.  

Examples;  When you look at Google, Facebook, and Microsoft's data centers they build additional buildings to add capacity to a site.  They did not build a building 4 times bigger than what they needed and grow into it over 10 years.  Modular data centers by Dell, HP, and Compass data centers allow those who feel they need to have buildings to use this same approach.  Once you jump off the rack top of rack switch it can make little difference whether you are going 5 ft, 500 ft, 5,000 ft, or 50,000 ft.

2011 data center presentation slides

I found this presentation just sitting on Utah University's website for a 2011 data center project.

NewImage

Found it interesting that this was 4,000 kW facility with 2,400 kW for critical load for a 1.7 PUE in 2011.

NewImage

Here are pictures of the data center space during construction.

I can't recall where I have ever seen Generac generators at a data center site.

Generators

Generators to provide power in the event of a power outage

Brand new facility and their hot/cold aisle containment is not that impressive.  I guess that would explain the 1.7 PUE at full load.  Got to think the PUE is 2.0 during the early phases.

Finished isle

Once the racks were installed, airflow zones were created to channel cold air to the racks and contain and vent the hot air from the data center. Power is delivered to each row of racks from an overhead distribution system.

Peek at Panel Discussion with Revlon and NetApp CIO at GigaOm Structure

GigaOm Structure is less than a month a way and I am moderating a panel with Revlon and NetApp CIOs.

HOW INFRASTRUCTURE CAN TRANSFORM BUSINESS SUCCESS

 

In this session we focus in on how the right IT infrastructure can create significant competitive advantage. Understanding that IT’s job is to make systems work for people, rather than people working for systems, Revlon sought to align IT to the business with the successful implementation of a private cloud. Their resulting infrastructure turned 3.6PB of data into a business driver and runs more than 500 applications in a virtualized environment. Their initiative has demonstrated clear ROI.

Moderated by:Dave Ohara - Founder, GreenM3 and Analyst, GigaOM Research
Speakers:David Giambruno - SVP and CIO, Revlon
 
Cynthia Stoddard - SVP and CIO, NetApp 

To give you an idea of what will be presented and here is a Forbes article with Revlon CIO David Giambruno.

Revlon CIO: Simplification Equals Speed. Speed Provides Agility.

 

David Giambruno

“Six years ago, Revlon IT was seen as an impediment to the business. My first task was simply to get IT out of the way of the business.”

—David Giambruno,
Revlon Senior VP and CIO

 

 

 

 

 

 

 

 

 

 

 

There are three main points made by David.

This transformation included three important milestones for Revlon:

  1. Platform simplification
  2. Global cloud deployment
  3. Global cloud production

This video discusses the transformation of Revlon IT.

VMware makes the smart move, Wholesale Lease vs. Build and Own

One of my good friends introduced me to a SW company that was going to build its first data center.  I told them why build, it is your first one.  Just lease three 2MW wholesale data center spaces.  You'll get a great price given your brand name recognition and you'll have three places to start your consolidation for the dozens of co-lo facilities around the world.  No, we've spent a lot of time with Gartner and we know what we are doing we are going to build our own. One year later I run into the consulting firm that brags they are building a 7.5 MW data center.  yeh, yeh.  you're building a small one when they have over 27 MW of space now.  One year later they finally pick a site.  One year later the IT exec leaves the SW company, and the data center is still not built.

I tell this story, because it is amazing what people will do to convince themselves that their move into data centers is to build their own data center and they don't seriously consider wholesale in multiple markets.  There will be plenty of consultants, site selection experts, analyst that will tell you why it is good to build a data center.  But, some of the smartest guys I know have figured out it is lower cost and faster time to market to lease wholesale than build their own data center.  

Case in point, VMware's announced 4 city cloud environment in Santa Clara, LV, Dallas, and Sterling will be in wholesale space. 

VMware is also in a poor position to compete by building ultra-modern data centers, as Facebook did in Prineville, Ore., and Forest City, N.C., and then offering low-cost compute cycles out of such infrastructure. On the contrary, VMware won't build anything. It will lease space from wholesale data center builders. It will then wheel in racks of servers, most likely from its Virtual Computing Environment (VCE) subsidiary, based on partner Cisco's converged compute and networking infrastructure, and throw on the switch.

With VMware going with a wholesale strategy, there may be more who understand that leasing wholesale can be more cost effective than building.  Maybe the folks at VMware after being in dozens and dozens of data centers for their VMware users have figured out their own data center market survey of what is cost effective.  The VMware guys are able to get 4 sites up and running in a fraction of time and cost compared to the executive I mentioned at the beginning who thought building a data center was the right answer.

One simple way to think about why this works is if you are brand name company with a pretty good footprint you can become an anchor tenant in a wholesale space.  Even though there are agreements that the client list is kept confidential, almost everyone knows who is connected to the inner circles find out who is in the space.  It's no different when famous people move into a high end apartment in NY.  With a stamp of endorsement that is good enough for the big brand name, the lesser known companies can be charged more to be in a space that was good enough for the rich and famous.  This is no different than the cachet that Apple has opening a retail store in a mall.  You know Apple is paying the lowest cost per sq ft, given they drive shopper traffic to the mall.

There is staff that VMware will be adding to run the data centers and here are some of the job posts.

Data Center Supervisor Job Las Vegas, NV, US May 4, 2013
Data Center Supervisor Job Reston, VA, US May 9, 2013
Data Center Engineer (Night Shift) Job Las Vegas, NV, US May 4, 2013
Data Center Engineer Job Las Vegas, NV, US May 7, 2013
Data Center Engineer Job Reston, VA, US May 9, 2013
Data Center Engineer Job Reston, VA, US May 9, 2013
Data Center Engineer Job Reston, VA, US May 9, 2013
Data Center Engineer Job Reston, VA, US May 4, 2013
Data Center Engineer Job Reston, VA, US May 4, 2013
Data Center Engineer Job Reston, VA, US May 4, 2013

VMware's first Wholesale space was in Sabey.

VMWare POD 3 - Sabey Data Center

Wenatchee, Washington

 

Hermanson completed Full Mechanical construction for this fast-tracked installation of a new data center in shelled out space. The total project area is approximately 21,000 SF and consists of 15,000 SF of “data hall” space for IT/Lab equipment, plus electrical rooms and a grey water equipment room. The scope of work includes the installation of new self-contained package units that will include outside air economizer capability and evaporative cooling, a new grey water system, construction of hot aisle containment, with new diesel powered standby generators with exhaust pipe risers on the exterior of the building and a new diesel fuel tank. The generator room requires additional fresh air intake louvers in the wall of an existing penthouse and additional radiator discharge louvers in the existing tilt-up concrete exterior walls.