Netflix sets Chaos Monkey free for all to use, next comes more monkeys - latency, conformity, doctor, janitor, security, 10-18, and Chaos Gorilla

Netfilx has been getting more and more attention, and I think part of that reason is they talk about things that go wrong, things that they have learned from.  Netflix has learned the lesson that people listen much more when you talk about your mistakes then when you self promote your error free ways.

Netflix's latest move is to release Chaos Monkey to the open source community.  Here is their blog post.

NewImage

Chaos Monkey released into the wild

We have found that the best defense against major unexpected failures is to fail often. By frequently causing failures, we force our services to be built in a way that is more resilient. We are excited to make a long-awaited announcement today that will help others who embrace this approach.
We have written about our Simian Army in the past and we are now proud to announce that the source code for the founding member of the Simian Army, Chaos Monkey,is available to the community.
Do you think your applications can handle a troop of mischievous monkeys loose in your infrastructure? Now you can find out.

What is Chaos Monkey?

Chaos Monkey is a service which runs in the Amazon Web Services (AWS) that seeks out Auto Scaling Groups (ASGs) and terminates instances (virtual machines) per group. The software design is flexible enough to work with other cloud providers or instance groupings and can be enhanced to add that support. The service has a configurable schedule that, by default, runs on non-holiday weekdays between 9am and 3pm. In most cases, we have designed our applications to continue working when an instance goes offline, but in those special cases that they don't, we want to make sure there are people around to resolve and learn from any problems. With this in mind, Chaos Monkey only runs within a limited set of hours with the intent that engineers will be alert and able to respond.
There are more Monkeys coming from the Simian Army.
NewImage

Inspired by the success of the Chaos Monkey, we’ve started creating new simians that induce various kinds of failures, or detect abnormal conditions, and test our ability to survive them; a virtual Simian Army to keep our cloud safe, secure, and highly available.

Latency Monkey induces artificial delays in our RESTful client-server communication layer to simulate service degradation and measures if upstream services respond appropriately. In addition, by making very large delays, we can simulate a node or even an entire service downtime (and test our ability to survive it) without physically bringing these instances down. This can be particularly useful when testing the fault-tolerance of a new service by simulating the failure of its dependencies, without making these dependencies unavailable to the rest of the system.

Conformity Monkey finds instances that don’t adhere to best-practices and shuts them down. For example, we know that if we find instances that don’t belong to an auto-scaling group, that’s trouble waiting to happen. We shut them down to give the service owner the opportunity to re-launch them properly.

Doctor Monkey taps into health checks that run on each instance as well as monitors other external signs of health (e.g. CPU load) to detect unhealthy instances. Once unhealthy instances are detected, they are removed from service and after giving the service owners time to root-cause the problem, are eventually terminated.

Janitor Monkey ensures that our cloud environment is running free of clutter and waste. It searches for unused resources and disposes of them.

Security Monkey is an extension of Conformity Monkey. It finds security violations or vulnerabilities, such as improperly configured AWS security groups, and terminates the offending instances. It also ensures that all our SSL and DRM certificates are valid and are not coming up for renewal.

10-18 Monkey (short for Localization-Internationalization, or l10n-i18n) detects configuration and run time problems in instances serving customers in multiple geographic regions, using different languages and character sets.

Chaos Gorilla is similar to Chaos Monkey, but simulates an outage of an entire Amazon availability zone. We want to verify that our services automatically re-balance to the functional availability zones without user-visible impact or manual intervention.

Cloud API Fight at GigaOm Structure

One of the more entertaining sessions was on  Cloud APIs.

API WARS: DELIVERING THE DE FACTO STANDARD

 

OpenStack. CloudStack. Amazon now lets Eucalyptus customers link their private clouds to AWS. The cloud industry has grown up, and after six years, Amazon is still on top. Do the open-source efforts have a chance, or is this recent fragmentation the last straw?

Moderated by:Jo Maitland - Research Director, GigaOM Pro
Speakers:Sameer Dholakia - Group VP and GM, Cloud Platforms Group, Citrix
 
Chris C. Kemp - CEO, Nebula and Co-Founder, OpenStack
 
Marten Mickos - CEO, Eucalyptus Systems 

The video is here.

Watch live streaming video from gigaomstructure at livestream.com

If you don't want to watch the video here is a post on the presentation.

If AWS is the WalMart of cloud, is OpenStack the Soviet Union?

Some of the most dynamic part of the presentation was this discussion..

Kemp took Citrix and Eucalyptus to task for reinforcing Amazon’s dominance rather than embracing the OpenStack project. As you can imagine, Eucalyptus Systems CEO Marten Mickos and Citrix Systems Cloud Platforms Group GM Sameer Dholakia took exception to Kemp’s claims, particularly his characterization of their cloud platforms as closed.

Both pointed out that their platforms are open-source, just like OpenStack, but Kemp refused to accept that definition, saying the companies developed the core of platform internally and then released their software to the open-source community. Kemp contrasted that with the OpenStack, which is developed top-to-bottom by its broad membership with no large company having any outsized influence.

What happens when your stock doesn't perform? Zynga an as example

Yahoo has had a steady exodus of talent for years.  Why? Part of the reason is the stock performance.

There is a post by a Zynga engineer on why work for Zynga?

Work For Zynga?

Kostadis Roussos, Zynga Chief Engineer

Why I joined Zynga is interesting, because in many ways it explains why I am still at Zynga.

My last employer was NetApp. NetApp is a leader in the enterprise storage space (in many ways they are the #2 or #1 player in the storage market). Not only are they are a leader in storage, they are a great company to work for. The year I left NetApp, NetApp had been named the Forbes #1 place to work. They were and are an extremely well run company, strongly positioned in the market. I loved my team, the place, and the work.

So why leave?

 

 

 

 

 

 

 

 

 

 

 

 

 

The SJ Mercury news has a post on employees fuming about the declining stock price.  Part of what is fueling the anger is what insiders were able to take advantage of.

Pincus and other executives sold 43 million shares, at $12 each, in the April deal. Limited to senior management and directors, it was explained as an effort to stagger the timing of when stockholders may cash out, preventing a simultaneous sell-off at the expiration of the lock-up.

Vs. the rest of the employees were able to exercise at the end of April.

The Farmville publisher is now barely holding at $6, down 40 percent from a December IPO price of $10, as employees at the end of April were freed from their "lock-up" agreements to sell their stock holdings. Employees past and present told Reuters that morale is ebbing along with the stock price.

 

What the Startup pitch really means

Virgin Entrepreneur has a great post on what the startup pitch really means.

Startup says what? Here's an entrepreneur guest blog on common startup myths and how you can learn from them...They are the little white lies entrepreneur like to tell partners, investors, potential employees and their spouses. It's like a code. Every sentence can be translated from startup-speak to the English equivalent. I know I'm in for a long conversation, meeting or pitch when I hear any of these...

One of my favorite lines is.

Startup Says: "The product is ... a breakthrough, patented, innovative, yada-yada-yada"

Translation: "Lipstick is on the pig get ready to watch us make bacon"

When I hear these adjectives I'm prepared for a crappy demo or an unfinished product. My advice is to dispense with the adjectives. I want to know what the product does, who the target customer is and then I want a demo. In almost every case I'll be able to detect the secret sauce if it's there and I'll be more excited because it was my discovery. After the demo reiterate the opportunity and then just shut up. The questions will come.

 

 

Startup says what?By Chuck Russell - Apr 12, 2012