LinkedIn Open Sources CPU Saving big data engine, Cubert

Saving power is something everyone wants to do, and thanks to open source advocates ideas can be spread by sharing.  Gigaom’s Jonathan Vanian posts on LinkedIn’s efforts called Cubert.

Linkedin said on Tuesday that it open sourced a framework called Cubert that uses specialized algorithms to organize data in a way that makes it easier to run queries without overburdening the system and wasting CPU resources.

Cubert, whose name is derived from the Rubik’s Cube, is supposedly as easy for engineers to work with as a Java application and it contains a “script-like user interface” from which engineers can use algorithms like MeshJoin and Cube on top of the organized data to save system resources when running queries.

Here is the LinkedIn Blog post.

 

About Cubert

Data scientists, analysts and engineers look for a computation platform that is designed for their real-life analytics needs, is fast even as the data scales, and is friendly in understanding and controlling the execution plans. 
 
We built Cubert to meet these requirements.


 

Cubert was built with the primary focus on better algorithms that can maximize map-side aggregations, minimize intermediate data, partition work in balanced chunks based on cost-functions, and ensure that the operators scan data that is resident in memory. Cubert has introduced a new paradigm of computation that:

  • organizes data in a format that is ideally suited for scalable execution of subsequent query processing operators
  • provides a suite of specialized operators (such as MeshJoin, Cube, Pivot) using algorithms that exploit the organization to provide significantly improved CPU and resource utilization

Cubert was shown to outperform other engines by a factor of 5-60X even when the data set sizes extend into 10s of TB and cannot fit into main memory.

An OS that scares the Linux Vendors, CoreOS designed for a modern data center

Being an old time OS guy I once made the observation “I think people would pay money to just have drivers and kernel of the OS updated and leave the new features as options."

A buddy told me to check out CoreOS. Why?  Because it has the security, service discovery, clustering and updating stuff that guys like AWS haven’t made a priority.  I was surprised at Gigaom Structure when AWS’s Werner Vogel said that security was something developers need to work on developing their apps.  Google’s Urs Hoelzle said Google thinks there are things they can do to make building secure services easier.

CoreOS makes security #1 priority and many other things that a modern data center group wants.

CoreOS is a server OS built from the ground up for the modern datacenter. CoreOS provides tools and guidance to ensure your platform is secure, reliable, and stays up to date.

Small Footprint

CoreOS utilizes 40% less RAM than typical Linux server installations. We provide a minimal, stable base for you to build your applications or platform on.

Reliable, Fast, Patching and Updates

CoreOS machines are patched and updatedfrequently with system patches and new features.

Built for Scale

CoreOS is designed for very large scale deployments.PXE boot and diskless configurations are fully supported.

Infoworld posts on how CoreOS is a threat to Linux vendors.

Indeed, by changing the very definition of the Linux distribution, CoreOS is an "existential threat" to Red Hat, Canonical, and Suse, according to some suggestions. The question for Red Hat in particular will be whether it can embrace this new way of delivering Linux while keeping its revenue model alive.

...

When I pressed him on what he meant by that last sentence, he elaborated:

CoreOS is the first cloud-native OS to emerge. It is lightweight, disposable, and tries to embed devops practices in its architecture. RHEL has always been about adding value by adding more. CoreOS creates value by giving you less [see the cattle vs. pets analogy]. If the enterprise trend is toward webscale IT, then CoreOS will become more popular with ops too.

Project Atomic is a competitor of CoreOS.  You can probably look for more choices with the idea that an OS service that just keeps it updated.  Updated with what?  Bug fixes, performance improvements, and better security.  That’s worth a lot.  

Building the Best Software Services, can you find the secret guild?

I have been the bay area for the past two weeks for business meetings before I head back to Redmond.  Actually haven’t been here for two weeks straight, taking two trips.  I’ve lived for 22 years in Redmond, and before that spent 32 years in Silicon Valley.  I go back and forth often enough that I have an office space in both locations.  How Silicon Valley works is different than Seattle/Redmond, but there is a common trait.  The guys who belong to the secret guild of low level programmers who can build services that scale and run like an energizer bunny.  Working on OS at Apple and Microsoft got me used to working with the developers who belong to the secret guild.

What is the secret guild?  Here is a post that tells the story.

the secret guild of silicon valley

The governors of the guild of St. Luke, Jan de Bray

A couple of weeks ago, I was drinking beer in San Francisco with friends when someone quipped:

"You have too many hipsters, you won’t scale like that. Hire some fat guys who know C++." 

It’s funny, but it got me thinking.  Who are the “fat guys who know C++”, or as someone else put it, “the guys with neckbeards, who keep Google’s servers running”? And why is it that if you encounter one, it’s like pulling on a thread, and they all seem to know each other?

The reason is because the top engineers in Silicon Valley, whether they realize it or not, are part of a secret Guild.  They are a confraternity of craftsmen who share a set of traits:

...

Read the post to get the rest of story.

For those of you too lazy to click on the link, here is the closing paragraphs.

Finally, the implicit compact that the Guild makes with a company is that their efforts will not be in vain.  The most powerfully attractive force for the Guild is the promise of building a product that will get into the happy hands of hundreds, thousands, or millions.  This is the coveted currency that even companies that have struggled to build an engineering reputation, like foursquare, can offer. 

The Guild of Silicon Valley is largely invisible, but their affiliations have determined the rise and fall of technology giants.  The start-ups who recognize the unsung talents of its members today will be tomorrow’s success stories.

 

Water vs. Agile project methodologies

I am about ready to jump on a webinar on Agile and Waterfall methodologies.

Agile Meets Waterfall: How to Manage Multiple Methodologies

January 14, 2014
11:00am — 11:59am PST

FEATURED PANELISTS

Rich Morrow
Rich Morrow founder / head geek,quicloud LLC
Dave Ohara
Jesse Dowdle
Jesse Dowdle Director of Engineering,AtTask

MODERATED BY

Agile methodologies have had tremendous success in task-oriented teams and are increasing their penetration into the enterprise. Still, Agile is just a tool, and not all projects, business processes, and corporate cultures are natural fits. But managing multiple methodologies can be an enormous challenge without the right approach.

Since I am talking on the subject I decided to write a bit first as notes to myself.

So, what is Waterfall Methodology?  Here is a post that compares Waterfall and Agile.  I’ll pull out nuggets that gives you the high level concepts.

What is the waterfall methodology?

Much like construction and manufacturing workflows, waterfall methodology is a sequential design process. This means that as each of the eight stages (conception, initiation, analysis, design, construction, testing, implementation, and maintenance) are completed, the developers move on to the next step.

...

Advantages of the Waterfall Methodology

1. The waterfall methodology stresses meticulous record keeping. Having such records allows for the ability to improve upon the existing program in the future.

...

Disadvantages of the Waterfall Methodology

1. Once a step has been completed, developers can’t go back to a previous stage and make changes.

...

What is Agile?

Agile came about as a “solution” to the disadvantages of the waterfall methodology. Instead of a sequential design process, the Agile methodology follows an incremental approach.

Developers start off with a simplistic project design, and then begin to work on small modules. The work on these modules is done in weekly or monthly sprints, and at the end of each sprint, project priorities are evaluated and tests are run. These sprints allow for bugs to be discovered, and customer feedback to be incorporated into the design before the next sprint is run.

...

Advantages of the Agile Methodology

1. The Agile methodology allows for changes to be made after the initial planning. Re-writes to the the program, as the client decides to make changes, are expected.

...

Disadvantages of Agile Methodology

2. As the initial project doesn’t have a definitive plan, the final product can be grossly different than what was initially intended.