Reducing Human Error in the Data Center, checklist manifesto

Domenico Alcaro, VP of Sales Schneider Electric presented to a full room breakout session on Human Error in the Data Center.  Domenic shared his presentation and here it is for your viewing with his permission.

image

Breakout B: Case Study - Eradicating Human Error: Lessons Learned from the US Nuclear Navy

Human error continues to be cited as a leading cause of data center downtime. The goal of eradicating this blight from the data center can be advanced by studying the US Nuclear Navy. In fact, the similarities between a mission critical data center and a mission critical nuclear propulsion plant are striking and many. This presentation will demonstrate the operational methodologies utilized by the US Nuclear Navy to reduce human error drawing comparison to a modern day data center every step of the way.

Domenic Alcaro, Vice President, Enterprise Sales, Schneider Electric

I was able to get access to Domenic presentation and I shared it with some other people ahead of time, and we started discussing human error in the data center.  One slide I especially liked is this one.

image

Note this last line for "The Checklist Manifesto" by Atul Gawande is a book suggested by a data center executive who I then passed on the information to Domenic.  Here is a web site too.

image

The book’s main point is simple: no matter how expert you may be, well-designed check lists can improve outcomes (even for Gawande’s own surgical team). The best-known use of checklists is by airplane pilots. Among the many interesting stories in the book is how this dedication to checklists arose among pilots.

Can the USN Submarine procedures be applied?  Here are Domenic's points on what can be done and obstacles.

image

image

Solar Flares/Storms affect on Data Center is not known, an answer with data collection

One of the great talks at 7x24 Exchange was given by Alex Young on a subject few have thought about.

NASA - The Influence of Solar Flares and Solar Storms: Why We Should Care About Space Weather

The Sun produces solar storms in the form of intense radiation and fast moving material. These storms can interact with the Earth to create electric currents in our atmosphere. The study of space weather developed to predict solar storms and understand their impact on our technology. The world's electrical grids-that fundamental technology enabling modern society-are vulnerable to these currents. While most days the sun's impacts are minimal, large solar storms have the potential to have a devastating impact on mission critical systems. This talk will present an overview of Space Weather to help your business begin to prepare for worst-case scenarios.

C. Alex Young, Ph.D., Solar Astrophysicist at NASA's Goddard Space Flight Center with ADNET Systems Inc. and the SOHO/STEREO Science Team

Here is a video of Alex discussing the Solar Flare on June 7, 2011.

What are potential affects on the infrastructure is shown here.

image

So what?  Check out this picture of what happened to a $10 Million power transformer in 1989.

image

image

And what is the affect on the electrical grid.

image

So other than risk of power outage what is the risk to a data center?

Luckily I sat with Alex at the speaker dinner and had a chance to chat much more and another data center executive joined in the discussion on what you could do about a solar storm that could last for days.

One choice we discussed is you could hope the arrival of solar storm is timed when it is night time and the storm strikes the other side of earth, but some storms last for days.  You could turn off the servers which is a strategy used by some satellites, but not a top choice.

So what could we do?  Here was my idea.  Why doesn't NASA notify the data center run by a company that is fanatical about data collection and tell them is the exact time when a solar storm will arrive at the data center site.  The data center operator then shares information back to NASA on error statistics that are potentially caused by the electromagnetic radiation storm.  Keep running this experiment to get data to answer the question of what happens to a data center during an electrical storm.

We moved to the Data Center Social 2.0 event and continued the discussions. One idea the data center executive came up with is can we collect information about the solar storm at the data center.  Alex said yes and pointed to Stanford Sudden Ionosphere Disturbance (SID).

image

So what is the plan.  The data center executive is going to back circulate the idea which we both agreed there would be two dozen data geeks who instantly jump on the idea.  Start the data collection and sharing with Alex at NASA, so he can start to answer the question of what is the effect on Solar Storms on a data center.

And, we may start an knowledge exchange that will get the data center industry ready for the peak in solar storms in 2013-2014, and answer the question what is the effect of a solar storm on a data center.

image

Inspiration for the Low Carbon Data Center, 7x24 Exchange keynote by Robert F Kennedy jr

I am sitting at 7x24 Exchange, the first keynote is delivered by Robert F. Kennedy jr, and I was lucky to meet him at Breakfast.  Sometimes I wonder whether people get the idea of a Green (low carbon) data center.  What a great way to start a data center conference with a keynote educating the data center audience of the issues faced by a carbon based economy.

image

image

CONFERENCE KEYNOTE:
"Green Gold Rush - A Vision for Energy Independence, Jobs, and National Wealth"

The creation of a green economy is an increasingly promising solution to multiple challenges. Sustainable business and energy independence are keys to our economic revitalization, according to Kennedy. America can boost its own infrastructure by powering industry with plentiful and domestic renewable resources. A sophisticated, well-crafted energy policy will help sharpen American competitiveness while reducing energy costs and our national debt. Intelligent energy policy is also the national fulcrum for US foreign policy and national security. From green jobs and technologies to weaning our reliance on carbon energy, Kennedy offers a bold vision to restore US economic might, safeguard our environment, and reestablish America's role as an exemplary nation.

Robert F. Kennedy, Jr., Visionary, Environmental Business Leader and Advocate

Here is an article that captures part of what Robert F. Kennedy Jr. presented.

If ever an issue deserved President Obama's promise of change, this is it. Mining syndicates are detonating 2,500 tons of explosives each day -- the equivalent of a Hiroshima bomb weekly -- to blow up Appalachia's mountains and extract sub-surface coal seams. They have demolished 500 mountains -- encompassing about a million acres -- buried hundreds of valley streams under tons of rubble, poisoned and uprooted countless communities, and caused widespread contamination to the region's air and water. On this continent, only Appalachia's rich woodlands survived the Pleistocene ice ages that turned the rest of North America into a treeless tundra. King Coal is now accomplishing what the glaciers could not -- obliterating the hemisphere's oldest, most biologically dense and diverse forests. Highly mechanized processes allow giant machines to flatten in months mountains older than the Himalayas -- while employing fewer workers for far less time than other types of mining. The coal industry's promise to restore the desolate wastelands is a cruel joke, and the industry's fallback position, that the flattened landscapes will provide space for economic development, is the weak punchline. America adores its Adirondacks and reveres the Rockies, while the Appalachian Mountains -- with their impoverished and alienated population -- are dismantled by coal moguls who dominate state politics and have little to prevent them from blasting the physical landscape to smithereens.

Obama promised science-based policies that would save what remains of Appalachia, but last month senior administration officials finally weighed in with a mixture of strong words and weak action that broke hearts across the region. The modest measures federal bureaucrats promised amount to little more than a tepid pledge of better enforcement of existing laws.

And government claims of doing everything possible to halt the holocaust are simply not true. George Bush gutted Clean Water Act protections. Obama must restore them.

Next Three weeks of Data Center Events

It’s been a great 4 weeks where I haven’t been on a plane.  I went to DCD Seattle and Amazon’s Technology Open House as local events.  But, now I have three straight weeks of travel.  I’ll be at the following events. 

June 12 - 15 7x24 Exchange, and we will be having Data Center 2.0 on Monday event for thought leaders we have identified.  Version 1.0 we had at Uptime and we are changing the format a bit for 2.0.

http://www.7x24exchange.org/

image_thumb[4]

June 22-23, GigaOm Structure.

June 30, Data Center Dynamics San Francisco.

image

Luckily I am from the Bay Area, and see my family while while I am down there, and my mom can fly back to Seattle with me, and we can fly back down to SJC after the weekend.  The house project is almost done.  Part of the fun we had a week ago was craning in the 1,000 lb marble countertop.

image

image

image

IP Network Discovery as a way to manage Data Center Power, joulex

I had a chance to talk Tom Noonan, President & CEO and Tim McCormick with joulex to discuss their data center power management solution.  What caught my eye is Tom's ME background and experience in systems engineering and real-time process control systems.

Tom Noonan
President & CEO

Tom Noonan assumed the role of president and CEO at JouleX in 2010 and also remains a partner at TechOperators, an early-stage investing firm he co-founded in 2008. He is the former chair, president and CEO of Internet Security Systems, which was acquired by IBM for $1.5 billion. Prior to ISS, Noonan held senior positions at Dun and Bradstreet Software, where he was vice president, worldwide marketing.  

After graduating from Georgia Tech with a Mechanical Engineering degree, Noonan joined Rockwell Automation as a systems engineer specializing in real-time process control systems for industrial automation applications. Noonan founded two successful control systems technology companies while residing in Boston: Actuation Electronics, a precision motion-control company and Leapfrog Technologies, a software development environment for real time process control and automation applications.

...

Tim McCormick
Vice President, Sales & Marketing

Tim McCormick brings over 25 years of marketing, sales and business development experience in both enterprise security and application software. Prior toJouleX, he was vice president of the Business Solutions Group at IBM Internet Security Systems. He also served as vice president of marketing for Lancope, a leading network behavior analysis and anomaly detection provider, and at ClickFox, a customer behavior intelligence solution provider.

One of the things that impressed me is JouleX uses an IP discovery strategy that allows an agentless approach to discover the inventory of power devices in the data center.  Note the Routers and Switches which are in the center of this diagram.

image

image

Working with other systems that has information about IP devices makes the discovery easier by communicating with devices that manage other devices.

image

This approach allows JouleX to create graphs like this on where the power is being used based on the IP addresses inventoried.

image