Facebook moves a Data Center Elephant, Dozens of Petabytes migrate to Prineville

Facebook has a post on migrating a huge Hadoop environment.  The post doesn't specifically call out the Prineville facility, but where else would they be moving to?

During the past two years, the number of shared items has grown exponentially, and the corresponding requirements for the analytics data warehouse have increased as well. As the majority of the analytics is performed with Hive, we store the data on HDFS — the Hadoop distributed file system.  In 2010, Facebook had the largest Hadoop cluster in the world, with over 20 PB of storage. By March 2011, the cluster had grown to 30 PB — that’s 3,000 times the size of the Library of Congress! At that point, we had run out of power and space to add more nodes, necessitating the move to a larger data center.

For those of you not familiar with what large data set Facebook would be moving.

y Paul Yang on Wednesday, July 27, 2011 at 9:19am

Users share billions of pieces of content daily on Facebook, and it’s the data infrastructure team's job to analyze that data so we can present it to those users and their friends in the quickest and most relevant manner. This requires a lot of infrastructure and supporting data, so much so that we need to move that data periodically to ever larger data centers. Just last month, the data infrastructure team finished our largest data migration ever – moving dozens of petabytes of data from one data center to another.

The post has lots of details and ends with a pitch to join the Facebook infrastructure team.

The next set of challenges for us include providing an ability to support a data warehouse that is distributed across multiple data centers. If you're interested in working on these and other "petascale" problems related to Hadoop, Hive, or just large systems, come join Facebook's data infrastructure team!

The data infrastructure team in the war room during the final switchover.

Curious I went to see what are the current job posts in the tech operations team.

Open Positions
Production Operations: Systems, Network, Storage, Database (14)

    Supply Chain, Program Management and Analysis (6)

    Hardware Design and Data Center Operations (12)

     

    Server Secret is getting out, on-chip Networking is more efficient, Facebook publishes Tilera 3X performance per watt vs. Xeon

    There is a bunch of news on Facebook publishing results on the Tilera Server.

    Facebook study shows Tilera processors are four times more energy efficient

    Facebook sides with Tilera in the server architecture debate

    Facebook: Tilera chips more energy efficient than x86

    What I found as most useful is the PDF of the paper that Facebook published.

    Many-Core Key-Value Store
    Mateusz Berezecki
    Facebook
    mateuszb@fb.com
    Eitan Frachtenberg
    Facebook
    etc@fb.com
    Mike Paleczny
    Facebook
    mpal@fb.com
    Kenneth Steele
    Tilera
    ken@tilera.com

    We show that the throughput, response time, and power
    consumption of a high-core-count processor operating at a low
    clock rate and very low power consumption can perform well
    when compared to a platform using faster but fewer commodity
    cores. Specific measurements are made for a key-value store,
    Memcached, using a variety of systems based on three different
    processors: the 4-core Intel Xeon L5520, 8-core AMD Opteron
    6128 HE, and 64-core Tilera TILEPro64.

    Here is the comparison of the Tilera, AMD, and Intel.

    image

    image

    Here is a good tip and reason to think about more than 64 GB of RAM per server for memcache services.

    As a comparison basis, we could populate the x86-based
    servers with many more DIMMs (up to a theoretical 384GB
    in the Opteron’s case, or twice that if using 16GB DIMMs).
    But there are two operational limitations that render this
    choice impractical. First, the throughput requirement of the
    server grows with the amount of data and can easily exceed
    the processor or network interface capacity in a single
    commodity server. Second, placing this much data in a single
    server is risky: all servers fail eventually, and rebuilding the
    KV store for so much data, key by key, is prohibitively
    slow. So in practice, we rarely place much more than 64GB
    of table data in a single failure domain. (In the S2Q case,
    CPUs, RAM, BMC, and NICs are independent at the 32GB
    level; motherboard are independent and hot-swappable at the
    64GB level; and only the PSU is shared among 128GB worth
    of data.)

    But, if you want to go beyond 64 GB, here are some numbers for a 256 GB RAM configuration.

    image

    And Conclusions.

    Our experiments show that a tuned version of
    Memcached on the 64-core Tilera TILEPro64 can yield at
    least 67% higher throughput than low-power x86 servers at
    comparable latency. When taking power and node integration
    into account as well, a TILEPro64-based S2Q server
    with 8 processors handles at least three times as many
    transactions per second per Watt as the x86-based servers
    with the same memory footprint.

    With the server secret of on-chip networking discussed.

    The main reasons for this performance are the elimination
    or parallelization of serializing bottlenecks using the on-chip
    network; and the allocation of different cores to different
    functions such as kernel networking stack and application
    modules. This technique can be very useful across architectures,
    particularly as the number of cores increases. In
    our study, the TILEPro64 exhibits near-linear throughput
    scaling with the number of cores, up to 48 UDP cores.

    Facebook invites Goldman Sachs, Rackspace, AMD, and Microsoft to speak at Open Compute Summit, announces Open Compute Foundation non-profit

    In Facebook's summary of the Open Compute Summit, they mention a community of presenters - Rackspace, Goldman Sachs, AMD, and Microsoft.

    As a part of growing the community, the following people shared their perspectives:

    • Joel Wineland and Bret Piatt from Rackspace shared their thoughts on how Open Compute Project servers could fit into their data center business. What was really awesome is that Rackspace benchmarked our Open Compute Project AMD 1.0 servers  against their own off-the-shelf hardware, and our servers did very well. For the first time, independent, external feedback on our designs was shared with the community! Rackspace also expressed what they would like to see this community do: to be ambitious and, most of all, to innovate.
    • Grant Richard and Matthew Liste from Goldman Sachs presented their vision of OCP hardware filling a big role in their large scale compute clusters and, more importantly, how hardware from multiple Open Compute Project vendors could dramatically improve their ability to manage their systems, which are much more heterogeneous than ours.
    • Bob Ogrey from AMD presented interest in Open Compute technology from China and other countries in East Asia, and discussed how AMD intends to open up their motherboard design files to ODMs in the near future.
    • Dileep Bhandarkar from Microsoft shared his experiences building modular data centers, comparing and contrasting with the data center and server designs from the Open Compute Project. Most importantly, Dileep presented a number of technological areas Microsoft is potentially interested in engaging with the Open Compute Project going forward.

    To continue the community effort Frank announced they will launch the Open Compute Foundation.

    To help facilitate collaboration, Frank also announced our intention to create a non-profit foundation with roles ranging from using this hardware to building it to actually contributing to the specifications and leading entire projects. While all of the details aren't yet worked out, each project will be separate, allowing you to choose exactly the areas where you want to contribute and want to avoid. These projects must embody the four tenets of efficiency, economy, environmental friendliness, and openness that have driven the Open Compute project from the start. Projects and hardware sold based on these designs must be aligned with these core tenets before they can call themselves "Open Compute."

    Facebook's Open Compute Project shares 2x Server v2.0 future and Storage Server v1.0

    Facebook's Frank Frankovsky was on a panel at Structure 2011 and was as Dell's Forrest Norrod the baloney in the sandwich between the vendors VMware and Dell.  Frank looks like a pretty happy piece of Baloney here.

    image

    THE ECONOMICS OF OPEN EVERYTHING

    The power of open-source software can’t be denied. At its best, it has democratized innovation and is a stub for other subsequent innovations. Think Apache and the web. But where is the money in it? Does there have to be a profit motive? We talk to two exponents of recent projects -- Open Stack and Cloud Foundry -- both of which are open and have the promise to shake up the cloud industry.

    Moderated by:Lew Moorman - Chief Strategy Officer and President of the Rackspace Cloud, Rackspace
    Speakers:Derek Collison - CTO, Chief Archictect, Cloud Division, VMware

    Frank Frankovsky - Director, Hardware Design and Supply Chain, Facebook

    Forrest Norrod - VP and GM, Server Platforms, Dell

    The session was dynamic and Forrest Nod was able to crack a few smiles as well.

    image

    I didn't get a picture of Frank and Forrest smiling at the same time, but they are both looking quite serious here.

    image

    I caught Frank later at the conference and he said they shared their Open Compute summit information that they had on June 17, 2011.  I am going to break the information into a few posts - this one is server and storage hardware.

    Facebook's Amir Michael announced v2.0 of the Open Computer Server that doubles the motherboard densities in the 1 1/2 U design.

    Doubling the Compute Density

    Amir Michael, Facebook’s hardware design manager, introduced our new initiatives in server hardware, presenting new AMD and Intel motherboard designs that double the compute density relative to our original designs.

    Instead of placing a single motherboard in each chassis, we’re now building servers with two narrow motherboards sitting next to each other. These motherboards support the next generation of Intel processors and AMD’s Interlagos. To enable these new designs, we’ve also modified the server chassis, power supply (700W output from 450W), server cabinet, and battery backup cabinet.

    What was not clear is what Facebook does for big storage for all the pictures on Facebook pages.  And.... the answer is Storage Server 1.0 which can support from one to four server connections with a variety of connection technologies to provide low cost and high performance.

    Storage Server v1.0

    One question that has been asked a number of times since releasing version 1.0 of the Open Compute designs is if Facebook plans to build a storage server. Amir announced a project designed for our storage intensive applications. It’s actually a platform approach in that you can vary the ratio of compute to storage using the same physical building blocks. If you fully load the server, each storage node can support 50 hard drives split across two controllers.

    Facebook's Latest Data Center Design presentation at Uptime

    Facebook gave a keynote presentation on its Data Center Design

    Facebook's Latest Innovations in Data Center Design
    Senior Electrical Engineer, Facebook  Paul Hsu
    Datacenter Mechanical Engineer, Facebook Dan Lee

    Below is a side by side slide Paul presented on the difference between a typical data center power conversion vs. the Facebook design.

    image

    Dan has a slide with side-by-side comparison of a typical mechanical system vs. the Facebook design.

    image

    A couple of other slides share are on the Reactor Power Panel and Battery cabinet.

    imageimage

    The results Facebook shared.

    image

    For more details you can find information at Facebook's Open Compute Project web site.

    If you want to see pictures of inside the Facebook data center check out http://scobleizer.com/2011/04/16/photo-tour-of-facebooks-new-datacenter/ and http://www.datacenterknowledge.com/archives/2011/04/19/video-facebooks-penthouse-cooling-system/