LinkedIn Open Sources CPU Saving big data engine, Cubert

Saving power is something everyone wants to do, and thanks to open source advocates ideas can be spread by sharing.  Gigaom’s Jonathan Vanian posts on LinkedIn’s efforts called Cubert.

Linkedin said on Tuesday that it open sourced a framework called Cubert that uses specialized algorithms to organize data in a way that makes it easier to run queries without overburdening the system and wasting CPU resources.

Cubert, whose name is derived from the Rubik’s Cube, is supposedly as easy for engineers to work with as a Java application and it contains a “script-like user interface” from which engineers can use algorithms like MeshJoin and Cube on top of the organized data to save system resources when running queries.

Here is the LinkedIn Blog post.

 

About Cubert

Data scientists, analysts and engineers look for a computation platform that is designed for their real-life analytics needs, is fast even as the data scales, and is friendly in understanding and controlling the execution plans. 
 
We built Cubert to meet these requirements.


 

Cubert was built with the primary focus on better algorithms that can maximize map-side aggregations, minimize intermediate data, partition work in balanced chunks based on cost-functions, and ensure that the operators scan data that is resident in memory. Cubert has introduced a new paradigm of computation that:

  • organizes data in a format that is ideally suited for scalable execution of subsequent query processing operators
  • provides a suite of specialized operators (such as MeshJoin, Cube, Pivot) using algorithms that exploit the organization to provide significantly improved CPU and resource utilization

Cubert was shown to outperform other engines by a factor of 5-60X even when the data set sizes extend into 10s of TB and cannot fit into main memory.