Saving power is something everyone wants to do, and thanks to open source advocates ideas can be spread by sharing. Gigaom’s Jonathan Vanian posts on LinkedIn’s efforts called Cubert.
Here is the LinkedIn Blog post.
Data scientists, analysts and engineers look for a computation platform that is designed for their real-life analytics needs, is fast even as the data scales, and is friendly in understanding and controlling the execution plans.
We built Cubert to meet these requirements.
Cubert was built with the primary focus on better algorithms that can maximize map-side aggregations, minimize intermediate data, partition work in balanced chunks based on cost-functions, and ensure that the operators scan data that is resident in memory. Cubert has introduced a new paradigm of computation that:
- organizes data in a format that is ideally suited for scalable execution of subsequent query processing operators
- provides a suite of specialized operators (such as MeshJoin, Cube, Pivot) using algorithms that exploit the organization to provide significantly improved CPU and resource utilization
Cubert was shown to outperform other engines by a factor of 5-60X even when the data set sizes extend into 10s of TB and cannot fit into main memory.