Facebook shares it data analysis, one way to get fired is to look where you aren't supposed to

Techcrunch has a post on Facebook's data analysis.

How Big Is Facebook’s Data? 2.5 Billion Pieces Of Content And 500+ Terabytes Ingested Every Day

JOSH CONSTINE

posted 2 hours ago
Facebook Big Data Numbers

Facebook revealed some big, big stats on big data to a few reporters at its HQ today, including that its system processes 2.5 billion pieces of content and 500+ terabytes of data each day. It’s pulling in 2.7 billion Like actions and 300 million photos per day, and it scans roughly 105 terabytes of data each half hour. Plus it gave the first details on its new “Project Prism”.

VP of Engineering Jay Parikh explained why this is so important to Facebook: “Big data really is about having insights and making an impact on your business. If you aren’t taking advantage of the data you’re collecting, then you just have a pile of data, you don’t have big data.” By processing data within minutes, Facebook can rollout out new products, understand user reactions, and modify designs in near real-time.

Another stat Facebook revealed was that over 100 petebytes  of data are stored in a single Hadoop disk cluster, and Parikh noted “We think we operate the single largest Hadoop system in the world.” In a hilarious moment, when asked “Is your Hadoop cluster bigger than Yahoo’s?”, Parikh proudly stated “Yes” with a wink.

If you concerned about who looks at the data, consider one way to get fired is to look where you are not supposed to.

Users might be a little bit uneasy about the idea that Facebook employees could look so deep into their activity, but Facebook assured me there are numerous protections against abuse. All data access is logged so Facebook can track which workers are looking at what. Only those working on building products that require data access get it, and there’s an intensive training process around acceptable use. And if an employee pries where they’re not supposed to, they’re fired. Parikh stated strongly “We have a zero-tolerance policy.”

Facebook's low power storage data center

Facebook has shared more details with Wired.com on its 3rd data center in Prineville.

The plan is to use the building to house a brand-new type of low-power, deep-storage device that Facebook engineers will cook up over the next six to nine months. They’re designing a hard-disk storage server that powers off when it’s not in use, says Tom Furlong, vice president of site operations at Facebook. “It’s going to sit in a dedicated building that is optimized to support this device that we don’t need to access very often.”

What will this building be like? Boxy and quiet, with rows of low-powered machines clicking on and off, says Furlong.

Facebook's 3rd data center in Prineville Data Center is different than the rest, backup DC

A standard rule for many is to have offsite backup.  But, when you have as much data as Facebook that would mean shipping such a huge quantity of tapes or HD that it would be a logistics nightmare.  And, a WAN connection couldn't be big enough for the flow into Facebook.

GigaOm's Katie Fehrenbacher reports that the the 3rd new data center in Prineville is actually a deep storage facility.

The building, which will potentially be 84,000 square feet, will be filled with disc or flash storage and will act as the “backup to the backup to the backup,” storage for the facility’s data, explained Facebook’s Ken Patchett.

This method makes sense, and I actually use it at home/office as well.  Whenever I touch my parallels environment on my Mac the whole VM needs to backed up which can be 20 - 30 GB.  This change gets streamed to a Drobo-FS from my home to my office which is a separate building connected by one gigabit ethernet.  Backing up this much data regularly to the cloud over my 5 megabit uplink would be so painful and take all day or more vs. an hour or two depending on how well the wireless connection works.

Will on site backup be more of a standard?  Google, Facebook, Amazon, Apple, and Microsoft most likely do this.  It makes a lot of sense for hospitals with the size of imaging data.  Financials need to backup offsite for regulatory issues.

Facebook adds a smaller 3rd data center to Prineville site

ABC via AP Reuters via The Bend Bulletin report that Facebook is adding a 3rd smaller data center site in Prineville.

Facebook Plans Third, Smaller Oregon Data Center

 PRINEVILLE, Ore. August 14, 2012 (AP)

 

Facebook Inc. has filed plans for a third data center in the central Oregon city Prineville, but it won't bring additional jobs.

 

The social network company has a 334,000-square-foot facility up and running in Prineville and a twin under construction next door. The third facility would be smaller, about 62,000 square feet.


pre-IPO Video gives hints of Facebook's strategy

I was talking to my neighbors enjoying the evening breeze and she asked what I thought of the future of Facebook.  I told her the number 1 issue is Google is laser focused on beating Facebook, and that is Facebook's biggest challenge.  Why?  Because Facebook is Google's top competitor for Ad Dollars.

HBR has an interesting article on the right way to run an IPO show and of course chooses to poke at Facebook.

The Right Way to Run an IPO Road Show

Over the past 17 years I've worked with hundreds of executives to raise billions of dollars
— from private equity to hedge funds to IPOs. I've seen road shows done right, but I've also seen every mistake in the book.

One part that HBR digs in at Facebook is on pre IPO video.

Procrastinating creates not only a very stressful environment but ultimately a show that is not as well-conceived, customized to the audience, and polished as it must be. If you need proof just look at Facebook's stale video pitch, which was scrapped on the second day of its road show amidst widespread complaints from important institutional investors that it left them little time for their key questions, and was boring to boot.

I watched the video and found it was obvious why Facebook bought Instagram.

 

 

The data center related topic is brought up around the 27 minute mark.

NewImage

I don't know about you, but watching the video the easiest person to watch was Chris Cox, VP of Product.

Business Insider called Cox “a triple threat -- an engineer who can build company-defining products, an operator who can recruit and manage good people, and a long-term strategic thinker,” and named Cox number 2 on its list of 10 Rock Star Tech Execs You’ve Never Heard Of.[5] He is also known for his focus on bringing people and technology together. “Technology does not need to estrange us from one another,” Cox told Wired. “The physical reality comes alive with the human stories we have told there.”[6]

Cox envisions a future in which what your friends recommend on social networks plays a bigger role in what you buy, do, or watch on TV. He told The Wall Street Journal that he believes there will be a time “when you turn on the TV, and you see what your mom and friends are watching, and they can record stuff for you. Instead of 999 channels, you will see 999 recommendations from your friends."[1]