Techcrunch has a post on Facebook's data analysis.
How Big Is Facebook’s Data? 2.5 Billion Pieces Of Content And 500+ Terabytes Ingested Every Day
Facebook revealed some big, big stats on big data to a few reporters at its HQ today, including that its system processes 2.5 billion pieces of content and 500+ terabytes of data each day. It’s pulling in 2.7 billion Like actions and 300 million photos per day, and it scans roughly 105 terabytes of data each half hour. Plus it gave the first details on its new “Project Prism”.
VP of Engineering Jay Parikh explained why this is so important to Facebook: “Big data really is about having insights and making an impact on your business. If you aren’t taking advantage of the data you’re collecting, then you just have a pile of data, you don’t have big data.” By processing data within minutes, Facebook can rollout out new products, understand user reactions, and modify designs in near real-time.
Another stat Facebook revealed was that over 100 petebytes of data are stored in a single Hadoop disk cluster, and Parikh noted “We think we operate the single largest Hadoop system in the world.” In a hilarious moment, when asked “Is your Hadoop cluster bigger than Yahoo’s?”, Parikh proudly stated “Yes” with a wink.
If you concerned about who looks at the data, consider one way to get fired is to look where you are not supposed to.
Users might be a little bit uneasy about the idea that Facebook employees could look so deep into their activity, but Facebook assured me there are numerous protections against abuse. All data access is logged so Facebook can track which workers are looking at what. Only those working on building products that require data access get it, and there’s an intensive training process around acceptable use. And if an employee pries where they’re not supposed to, they’re fired. Parikh stated strongly “We have a zero-tolerance policy.”