Facebook moves a Data Center Elephant, Dozens of Petabytes migrate to Prineville

Facebook has a post on migrating a huge Hadoop environment.  The post doesn't specifically call out the Prineville facility, but where else would they be moving to?

During the past two years, the number of shared items has grown exponentially, and the corresponding requirements for the analytics data warehouse have increased as well. As the majority of the analytics is performed with Hive, we store the data on HDFS — the Hadoop distributed file system.  In 2010, Facebook had the largest Hadoop cluster in the world, with over 20 PB of storage. By March 2011, the cluster had grown to 30 PB — that’s 3,000 times the size of the Library of Congress! At that point, we had run out of power and space to add more nodes, necessitating the move to a larger data center.

For those of you not familiar with what large data set Facebook would be moving.

y Paul Yang on Wednesday, July 27, 2011 at 9:19am

Users share billions of pieces of content daily on Facebook, and it’s the data infrastructure team's job to analyze that data so we can present it to those users and their friends in the quickest and most relevant manner. This requires a lot of infrastructure and supporting data, so much so that we need to move that data periodically to ever larger data centers. Just last month, the data infrastructure team finished our largest data migration ever – moving dozens of petabytes of data from one data center to another.

The post has lots of details and ends with a pitch to join the Facebook infrastructure team.

The next set of challenges for us include providing an ability to support a data warehouse that is distributed across multiple data centers. If you're interested in working on these and other "petascale" problems related to Hadoop, Hive, or just large systems, come join Facebook's data infrastructure team!

The data infrastructure team in the war room during the final switchover.

Curious I went to see what are the current job posts in the tech operations team.

Open Positions
Production Operations: Systems, Network, Storage, Database (14)

    Supply Chain, Program Management and Analysis (6)

    Hardware Design and Data Center Operations (12)