Insight into Google Data Center Operations, Site Reliability Presentation at PuppetConf

I was at PuppetConf 2013 in SF for the first time and had a great time.  After the opening Keynote by Luke Kaines, was a presentation by Google Site Reliability Engineer, Gordon Rowell.

Google’s Corporate Engineering SRE team provides infrastructure services used by many of Google’s desktops, laptops and servers. This talk gives an overview of the design philosophy, challenges, technologies and some interesting failures seen while implementing infrastructure at scale.
Speakers

Gordon Rowell

Site Reliability Manager, Google
Gordon Rowell is a site reliability manager at Google, Sydney. His team focuses on delivering services to Googlers around the world. They have migrated major internal services to run on Google technology and are currently focused on removing dependencies on the corporate network. | He enjoys the challenges of building robust systems that scale and has a particular passion for configuration management. 

The presentation is here.

Key takeaways I saw are in these slides.

NewImage

NewImage

NewImage

NewImage