Facebook has a post on migrating a huge Hadoop environment. The post doesn't specifically call out the Prineville facility, but where else would they be moving to?
During the past two years, the number of shared items has grown exponentially, and the corresponding requirements for the analytics data warehouse have increased as well. As the majority of the analytics is performed with Hive, we store the data on HDFS — the Hadoop distributed file system. In 2010, Facebook had the largest Hadoop cluster in the world, with over 20 PB of storage. By March 2011, the cluster had grown to 30 PB — that’s 3,000 times the size of the Library of Congress! At that point, we had run out of power and space to add more nodes, necessitating the move to a larger data center.
For those of you not familiar with what large data set Facebook would be moving.
y Paul Yang on Wednesday, July 27, 2011 at 9:19am
Users share billions of pieces of content daily on Facebook, and it’s the data infrastructure team's job to analyze that data so we can present it to those users and their friends in the quickest and most relevant manner. This requires a lot of infrastructure and supporting data, so much so that we need to move that data periodically to ever larger data centers. Just last month, the data infrastructure team finished our largest data migration ever – moving dozens of petabytes of data from one data center to another.
The post has lots of details and ends with a pitch to join the Facebook infrastructure team.
The next set of challenges for us include providing an ability to support a data warehouse that is distributed across multiple data centers. If you're interested in working on these and other "petascale" problems related to Hadoop, Hive, or just large systems, come join Facebook's data infrastructure team!
The data infrastructure team in the war room during the final switchover.
Curious I went to see what are the current job posts in the tech operations team.
Production Operations: Systems, Network, Storage, Database (14)
- Application Operations Engineer
- Data Warehouse Operations Engineer
- Manager, Datacenter Network Engineering
- Manager, Site Reliability Operations
- Messaging Engineer
- MySQL Database Engineer
- Network Engineer
- Network Engineer, Corporate
- Network Operations Engineer
- Network Operations Engineer (Dublin)
- Operations Engineer
- Operations Engineer (Dublin)
- Site Reliability Operations Lead
- Storage Engineer
Supply Chain, Program Management and Analysis (6)
- Commodity Manager
- Logistics and Inventory Manager
- Operations Analyst
- Site Operations Analyst
- Sourcing Manager, Network Engineering
- Technical Program Manager
Hardware Design and Data Center Operations (12)
- Data Center Capacity Planning Manager
- Data Center Lease & Site Selection Analyst
- Data Center Network Technician (NC)
- Data Center Network Technician (VA)
- Data Center Site Selection Manager
- Hardware Engineer
- Hardware Validation Engineer
- Infrastructure Build & Operations Engineer
- Lead Data Center Technician (NC)
- Power Engineer
- Power Test Engineer
- Storage Hardware Design Engineer