LinkedIn Open Sources CPU Saving big data engine, Cubert

Saving power is something everyone wants to do, and thanks to open source advocates ideas can be spread by sharing.  Gigaom’s Jonathan Vanian posts on LinkedIn’s efforts called Cubert.

Linkedin said on Tuesday that it open sourced a framework called Cubert that uses specialized algorithms to organize data in a way that makes it easier to run queries without overburdening the system and wasting CPU resources.

Cubert, whose name is derived from the Rubik’s Cube, is supposedly as easy for engineers to work with as a Java application and it contains a “script-like user interface” from which engineers can use algorithms like MeshJoin and Cube on top of the organized data to save system resources when running queries.

Here is the LinkedIn Blog post.

 

About Cubert

Data scientists, analysts and engineers look for a computation platform that is designed for their real-life analytics needs, is fast even as the data scales, and is friendly in understanding and controlling the execution plans. 
 
We built Cubert to meet these requirements.


 

Cubert was built with the primary focus on better algorithms that can maximize map-side aggregations, minimize intermediate data, partition work in balanced chunks based on cost-functions, and ensure that the operators scan data that is resident in memory. Cubert has introduced a new paradigm of computation that:

  • organizes data in a format that is ideally suited for scalable execution of subsequent query processing operators
  • provides a suite of specialized operators (such as MeshJoin, Cube, Pivot) using algorithms that exploit the organization to provide significantly improved CPU and resource utilization

Cubert was shown to outperform other engines by a factor of 5-60X even when the data set sizes extend into 10s of TB and cannot fit into main memory.

Debating upgrading GreenM3 From Squarespace 5 to Squarespace 7

It’s been over 3 years since I switched from Typepad to Squarespace 5 for this blog.  Squarespace has upgraded from 5 to 6 and now 7.  There are some technical details on why I like squarespace 5 which don’t work on squarespace 7, but sometimes it is better change than hold on to hold habits.

Squarespace today announced its first major platform update in two years: Squarespace 7, adding new splash pages, templates and integrations with Getty Images and Google Apps.

The release advances the completely rebuilt codebase that arrived with the release of Squarespace 6 in 2012. Squarespace 7 will become available in waves to customers starting tonight. Customers can opt-in to the platform, or choose to hold off on upgrading during the early transition.

Will I change?  Most likely, but going to spend a few more days thinking about it.

What's Facebook going to look like in 5 years, Youtube? Video is the future

PCWorld reports on Mark Zuckerberg’s public statement that in 5 years Facebook will mostly be video.

Facebook will be mostly video in 5 years , Zuckerberg says

If you think your Facebook feed has a lot of video now, just wait.

“In five years, most of [Facebook] will be video,” CEO Mark Zuckerberg said Thursday during the company’s first community town hall, in which he took questions from the public on a range of topics.

He was responding to a question about whether the growing number of photos uploaded to Facebook is putting a drag on its infrastructure. But Facebook’s data centers have it covered, he said. The real challenge is improving the infrastructure to allow for more rich media like video in people’s feeds.

Who else’s future will be dominated by video content?

The Cloud Battle, A War to Sell Data Center Bits - Amazon, Google, Microsoft

This time of year is turning into a Cloud Battle, a war between Amazon, Google, and Microsoft to deliver bits as a service from data centers. iPhone vs. Android is a battle of mobile bits.  OS X vs. Windows 7/8/10 is a battle of desktop bits.  The Cloud is a battle to deliver bits as a service from data centers.

Microsoft had their cloud, and Google just finished theirs.  Next week is AWS Reinvent.  The media covers the battles.

Google's Newest Attack On Amazon

When I read so many of the media articles though I think they are focused on how big fleet is or the latest technology.  Huh?  Like this article makes the point of measuring the naval power by the tonnage of the fleet misses the point.

Measuring Naval Power: Bigger Ain’t Always Better

...

Navies were largely symmetrical in those thrilling days of yesteryear. That simplified matters. Size was a decent proxy for fighting power when battle fleets made up largely of capital ships bearing big guns squared off. That was before the era — an era that persists to this day — when small craft could carry armament comparable to that of capital ships. A destroyer couldn’t tote big guns back then. A lowly missile boat or sub can fire munitions comparable to those of a capital ship today — and to the same deadly effect.

I have got a chance to close hand see how executives at Google (Urs Hoelzle), Amazon (Werner Vogel), and Microsoft (Scott Guthrie) perform at Gigaom Structure on stage and behind.  It’s kind of like seeing the Generals/Admirals of the military.

This is not a simple battle where more servers and more MW of data center capacity win the war.  How well your team operates using the technology which in the case of the bits (software) was created by other team members is so important.

I think I could write a whole book on the battles between between Google, Amazon, and Microsoft. In fact, I am sure there is someone who has already made a book proposal for this.  Unfortunately or fortunately, I am too busy working on other things to document things in an entertaining way to sell a book.  What I can do is watch as an observer to see strategies being played.

The Cloud Battle may be one of the most interesting technology wars fought with billions of dollars of data centers and IT equipment and 10,000s of development staff, reaching around the world.

Below is Google’s Points of Presence.

NewImage

Oh, one point I do want to make that I forgot is.  Just like Sun Tzu the Art of War Point 18.  “All warfare is based on deception”  The good know how to deceive the enemy and they can use the media to spread the deception.  Don’t believe everything you read.

18. All warfare is based on deception.

Seattle is the Cloud Hub - Amazon, Microsoft, Google, Others, and now Apple

Apple’s recent arrival to Seattle as the media and others talking about Seattle as a Cloud Capital.

The Seattle region has emerged as a major cloud computing hub thanks to Amazon Web Services, Microsoft Azure and a wide range of startups focused on cloud infrastructure and services. Much of Google’s cloud infrastructure work happens out of its Seattle-area offices.

In the short run, the new Apple office could intensify the competition for top engineers, but long-term it promises to add to the region’s status as a cloud center.

Influx of tech giants

Apple is the latest in a long list of tech giants from Silicon Valley and elsewhere who have established engineering outposts in the Seattle region. That list includes Google, Facebook, Oracle, HP, and many others, most recently Alibaba.

Talking to a friend who has the challenge to hire Cloud infrastructure engineers who isn’t in the above list he made the following observation.  The typical pattern is engineers start at Microsoft, then move to Amazon, then Google.  His challenge is to catch the engineers while they are making the transition and hire them to his company.  

Take a 22 year old software engineer.  Have them spend 3 years at Microsoft, 3 years at Amazon, and if they were able to make it to 3 years at Google.  They’ll be 31 years old with 9 years experience building Clouds at Microsoft, Amazon, and Google.  That is killer resume, and he/she can go anywhere in the world now.

Name another area you could do that and not move.

Oh and there are handful of people who will be able to put Apple on their resume.  Now that would kill.  Microsoft, Amazon, Google, Apple.