Architecting for Outages, an architect posts on surviving AWS

Everyone wants to survive a data center outage, but as AWS outage shows, not all do survive.  Here is a post that summarize best practices in SW architecture to survive an outage like AWS.

Retrospect on recent AWS outage and Resilient Cloud-Based Architecture

DateThursday, June 9, 2011 at 8:19AM

A bit over a month ago Amazon experienced its infamous AWS outage in the US East Region. As a cloud evangelist, I was intrigued by the history of the outage as it occurred. There were great posts during and after the outage from those who went down. But more interestingly for me as architect were the detailed posts of those who managed to survive the outage relatively unharmed, such as SimpleGeo, Netflix,SmugMug, SmugMug’s CTO, Twilio, Bizo and others.

The list of best practices are:

The main principles, patterns and best practices are:

  • Design for failure
  • Stateless and autonomous services
  • Redundant hot copies spread across zones
  • Spread across several public cloud vendors and/or private cloud
  • Automation and monitoring
  • Avoiding ACID services and leveraging on NoSQL solutions
  • Load balancing

If this seems daunting, there are new services coming to provide scalability and availability services.

The emerging solution to this complexity is a new class of application servers that offers to take care of the high availability and scalability concerns of your application, allowing you to focus on your business logic. Forrester calls these "Elastic Application Platforms", and defines them as:

An application platform that automates elasticity of application transactions, services, and data, delivering high availability and performance using elastic resources.

Data Center Analytics supports better decision making, Power Assure ships new capabilities

Power Assure has a press release on their new analytics capabilities. 

Energy Management version 4 (EM/4) software enables actionable-intelligence for maximizing data center efficiency

Santa Clara, Calif. – May 9, 2011 - Power Assure®, Inc., a data center infrastructure management solutions provider, today introduced at Uptime Institute’s Symposium 2011 Data Center Analytics for its Energy Management software platform, version 4 (EM/4). Data Center Analytics gives data center operators for the first time the ability to analyze and synthesize the overwhelming amount of raw data now available on data center equipment performance and turn it into useful business information to improve the efficiency, capacity and performance of their data centers.

The Analytics capability exists side-by-side with the monitoring and automation modules.

image

Here is a sample dashboard from Power Assure to visualize data center systems.

image

Fed CIO targets a key area to improve Fed IT, World Class Program Managers

I was reading ZDNET's post on the Fed targeting to close 800 data centers by 2015.

Fed CIO Kundra: We need to shut 800 data centers down by 2015

By Larry Dignan | April 12, 2011, 1:49pm PDT

Summary

Vivek Kundra, the chief information officer of the Federal government, said Tuesday that the company is actively shutting down 800 data centers by 2015.

Then I read the PDF testimony by Vivek Kundra to understand more, and found the section on strengthening program managers.

image

I spent much of my career at Apple and Microsoft as a program manager working on operating system releases.  Three of some of the best I worked with are:

Sheila Brady - Project Leader for System 7

Dennis Adler - Group Program Manager for Windows 95


While a director for MSR, Mr. Adler led a team from Research to the Windows Server Division and oversaw the initial development of a new server deployment and management technology that shipped with Windows Server 2003; formed the University Relations Group in
MSR and oversaw its rapid growth and initial expansion into Europe and Asia; was instrumental in MSR’s initial external public relations endeavors; and was a liaison between MSR and Microsoft’s product teams, as well as a Group Program Manager in the Personal
Systems Division.  Mr. Adler led the team responsible for designing and overseeing the development of the core components for Windows 95

John Medica - Project Leader for Macintosh II

John K. Medica retired as Senior Vice President and Co-Leader, Product Group from Dell Inc. in April 2007. In 1993, Mr. Medica joined Dell as Vice President, Portable Systems. During 1996, he served as President and Chief Operating Officer of Dell?s Japan division. He returned to the U.S. in August 1997 as Vice President, Procurement, and later served as Vice President, Web Products Group, and Vice President and General Manager, Transactional Product Group. Prior to joining Dell, he served as Project Leader for the Macintosh II, Director of the Macintosh CPU Projects Group and Senior Director of PowerBook Engineering with Apple Computer. Mr. Medica received his bachelor?s degree in Electrical Engineering from Manhattan College, and his master?s degree in Business Administration from Wake Forest University. Mr. Medica is currently a director of Compal Electronics, Inc., a publicly traded company.

These are some of top program managers I learned a lot from and the unique talent to manage complex projects with 100s if not 1000s of people on projects that were market successes.  And, I have been lucky to be able to connect with these great program managers even the products shipped.

Hadoop competitor MapR, information starts leaking

Hadoop is a hot topic as companies like Facebook, eBay, Yahoo have system power by Hadoop.

This page documents an alphabetical list of institutions that are using Hadoop for educational or production uses. Companies that offer services on or based around Hadoop are listed in Distributions and Commercial Support .

I have been waiting for MapR to post more information on their Hadoop competitor, but nothing is on their site.  But, GigaOm has the scoop on the latest status.

What Is Mapr Doing?

They are said to be building a proprietary replacement for the Hadoop Distributed File System that’s allegedly three times faster than the current open-source version. It comes with snapshots and no NameNode single point of failure (SPOF), and is supposed to be API-compatible with HDFS, so it can be a drop-in replacement.

MapR's potential to be three times faster in theory reduces the compute requirements a third.  Even at a 50% reduction, it is hard to not look at the potential energy savings.

Ex-Goldman Sachs programmer gets 8 years for stealing high-frequency trading code

Reuters has the news on an Ex-Goldman Sachs programmer sentenced to 8 years in prison.

Ex-Goldman programmer gets 8 years for code theft

Sergey Aleynikov and his lawyer, Sabrina Shroff, depart from federal court in New York February 17, 2010. REUTERS/Chip East

By Grant McCool

NEW YORK | Fri Mar 18, 2011 10:55pm EDT

(Reuters) - A former Goldman Sachs Group Inc (GS.N) computer programer was sentenced to eight years in prison on Friday for stealing secret code used in the Wall Street bank's valuable high-frequency trading system.

Sergey Aleynikov, was arrested by the FBI and charged in July 2009 with copying and removing trading code from Goldman before taking a new job at Teza Technologies LLC, a high-frequency trading startup firm in Chicago.

But, who would believe his statement?

"I very much regret the foolish thing of downloading information," theRussian-born father of three said at his sentencing on Friday. "Part of this information was proprietary to Goldman. I never meant to cause Goldman any harm or harm anyone at the bank."

He goes to Teza Technologies where his actions were traced.

But the strange thing is that even though Aleynikov was a software expert, (his credentials are impressive - Read Linkedin profile), the mistake he made was downloading so much source code to his home computer since his programming commands were recorded by Goldman's back-up systems, as reported by The New York Times (see below). The bank also noticed the surge in data moving from its servers.

This is what happened according to New York Times DealBook: "...just before he left (Goldman Sachs), according to the complaint, Mr. Aleynikov used his desktop computer at Goldman's New York offices to upload a stream of code to a Web site hosted by a server based in Germany. Later, he downloaded the files again to his home computer, his laptop computer and to a memory device."

The case sheds light on the secret world of high frequency trading, but also attests to the security precautions taken by investment banks. Meanwhile, ZeroHedge points out that Aleynikov was arrested the day after he joined Teza Technologies, co-founded by Misha Malyshev, a former head of high-frequency trading at hedge fund Citadel Investment Group LLC. (Teza was reportedly paying hin $1.4 million.) Aleynikov was suspended without pay and Teza is cooperating with the investigation, according to a Teza spokesman's statement, which also said the firm was not aware of alleged misconduct.The case sheds light on the secret world of high frequency trading, but also attests to the security precautions taken by investment banks.