Facebook updates Open Compute Project for the community, launches new look

If you go to OpenCompute.org you'll see a new look.

image

Facebook's Yael Maguire discusses the changes.

WELCOME!

WEDNESDAY, JULY 27, 2011 | Posted by Yael Maguire at 16:08 PM

Welcome to the new opencompute.org! This revamp focuses the site on projects and the community. Please bear with us as we work out our kinks, but we have a new streamlined project browser with links to some projects on GitHub! Our original specifications were created in Word and converted to PDFs, not a code-friendly manner to do open hardware development. We decided to switch our V2 specifications to MultiMarkDown, a simple text format used for the Web that easily converts to HTML and PDF. With this switch we now have a process for making contributions:

  1. Sign up on the site (link through Facebook).
  2. Sign an individual Contributor License Agreement (CLA).
  3. Get the code on GitHub.
  4. Make a patch to a spec and submit it to us at https://github.com/facebook/opencompute/issues

Facebook moves a Data Center Elephant, Dozens of Petabytes migrate to Prineville

Facebook has a post on migrating a huge Hadoop environment.  The post doesn't specifically call out the Prineville facility, but where else would they be moving to?

During the past two years, the number of shared items has grown exponentially, and the corresponding requirements for the analytics data warehouse have increased as well. As the majority of the analytics is performed with Hive, we store the data on HDFS — the Hadoop distributed file system.  In 2010, Facebook had the largest Hadoop cluster in the world, with over 20 PB of storage. By March 2011, the cluster had grown to 30 PB — that’s 3,000 times the size of the Library of Congress! At that point, we had run out of power and space to add more nodes, necessitating the move to a larger data center.

For those of you not familiar with what large data set Facebook would be moving.

y Paul Yang on Wednesday, July 27, 2011 at 9:19am

Users share billions of pieces of content daily on Facebook, and it’s the data infrastructure team's job to analyze that data so we can present it to those users and their friends in the quickest and most relevant manner. This requires a lot of infrastructure and supporting data, so much so that we need to move that data periodically to ever larger data centers. Just last month, the data infrastructure team finished our largest data migration ever – moving dozens of petabytes of data from one data center to another.

The post has lots of details and ends with a pitch to join the Facebook infrastructure team.

The next set of challenges for us include providing an ability to support a data warehouse that is distributed across multiple data centers. If you're interested in working on these and other "petascale" problems related to Hadoop, Hive, or just large systems, come join Facebook's data infrastructure team!

The data infrastructure team in the war room during the final switchover.

Curious I went to see what are the current job posts in the tech operations team.

Open Positions
Production Operations: Systems, Network, Storage, Database (14)

    Supply Chain, Program Management and Analysis (6)

    Hardware Design and Data Center Operations (12)

     

    RAW vs. JPG, 25% images are now RAW

    10 years ago at Microsoft, four of us had this idea that RAW imaging would be big.  I wrote a blog post with some of the history.

    Story of Adobe & Apple High-Value Digital Image Applications, Adobe’s angst developing for the iPad, and how Microsoft missed this battle

    MONDAY, MAY 17, 2010 AT 3:25AM

    This is not a data center post, but one about competition and innovation.

    If you are a high-end photographer person you use the RAW imaging format, a higher quality image format vs. JPEG.

    A camera raw image file contains minimally processed data from the image sensor of either a digital camera, image, or motion picture film scanner. Raw files are so named because they are not yet processed and therefore are not ready to be printed or edited with a bitmap graphics editor.

    Microsoft just released a RAW image CODEC for Windows 7 and Vista, and what was interesting is the analysis says 25% of images are RAW.

    Photo Gallery now supports raw format

    by Brad Weed

    We all take a lot of photos. In fact, according to data provided by InfoTrends, more than 73 billion still images were shot in the US alone in 2010. If you’re lucky enough to own a DSLR (digital single lens reflex) camera, you’re likely to take two and a half times as many photos in a given month as your friends with point-and-shoot cameras. That’s a lot of photos. What’s more, nearly a quarter of those photos are taken in a raw image format.

    The group of 4 that had the original RAW image idea 10 years ago are no longer at Microsoft.  One is a Google executive, one is an Adobe executive, one is an imaging consultant, and myself.

    At least now, we finally have the data to say how big the RAW imaging format is.  25% of the market.  Now the market is big enough, and a product can be developed.  But, it is a little too late to try and come to market now.

    Thinking air-side economizer use, consider Seattle has had only 351 minutes of over 80 degrees this summer

    I know this will not make you locate your data center in Seattle, but this is a fun piece of weather trivia.  As of July 24 there have been only 351 minutes of over 80 degree weather this summer.

    Seattle soaks up some summer -- 273 minutes of it to be exact

    By Scott Sistek

    Story Created: Jul 24, 2011 at 9:39 PM PDT

    For a while, it seemed Seattle was about to cement its legacy as home of the 78-minute summer.

    But no more.

    With a nice warm, sunny Sunday, Seattle now has had its first extended "summer experience" which I had defined as 80 degrees or warmer at the University of Washington.

    The total tally was 273 minutes Sunday (4 hours, 33 minutes) bringing our entire Seattle summer experience up to a whopping 351 minutes.

    I have joked that if Global Warming happens we are going to see a migration to the Pacific Northwest.

    And Texas is on the end of the spectrum.

    Meanwhile, in Waco, the streak continues for now. Thursday's high of 103 degrees marks the 29th straight triple-digit day and the 46th such day of 2011.

    Nebula launches Hardware Appliance to run the cloud, but will users want the HW or SW?

    The cloud is about virtualized environments.  So, it is bit ironic that Nebula's first product is a physical hardware appliance when the solution could be downloaded bits.

    Nebula Cloud Appliance

    What they’re all working is fairly fascinating: A hardware appliance pre-loaded with customized OpenStack software and Arista networking tools, designed to manage racks of commodity servers as a private cloud.

    ...

    Kemp wasn’t planning to do an appliance, he admits, but initial investor Bechtolsheim convinced him it was the right approach. It lets Nebula provide a turnkey product for deploying OpenStack, Kemp explained, by optimizing and locking down some of the variables that might make deploying a private cloud more difficult.

    Nebula's team didn't like the Eucalyptus product and choose OpenStack.

    However, even with all the specialization, Nebula is very committed to building the core OpenStack code base. “OpenStack exists because Eucalyptus didn’t work at NASA,” Kemp acknowledged, so he understands the importance of solid, customizable, open-source code.

    Ultimately, he said, a better OpenStack means a better Nebula, because Nebula can focus on filling in the gaps and not on reinventing the wheel. Much like Bechtolsheim was successful at Sun Microsystems  by building atop Unix and at Arista by using standard hardware components.

    Here is a question.  If Nebula is the cloud appliance.

    Elastic Infrastructure

    The Nebula appliance dynamically provisions and destroys virtual infrastructure and storage as workloads fluctuate.

    Why wouldn't you run the Nebula SW on multiple Open Compute Servers in your cloud environment?  Seems like the Nebula appliances are single point of failures unless you have multiple instances running in your cloud environment.  Which should be easy if you buy a few more Open Compute Servers.

    Nebula was announced at OSCON,  but who would let their cloud environment be down waiting for a Fedex and ship their cloud data outside the company in their Nebula Appliance?

    Nebula will supply the appliance. "If it fails, FedEx it back to us, and we'll send you another one," Kemp said. "Our little box has a 10 gigabit ethernet switch built into it. You can plug cheap commodity servers into the rack. You don't have to turn them on. It will do that. The interface is like Amazon Services." These servers act as monitors by this appliance, including log files and flow data. "What we do is create interface points to all of the common CMDB tools, managing tools, security tools, like ArcSight or Splunk," said Kemp. "We will create integration points for those particular products."

    I am sure there is a high availability architecture that Nebula has, but why buy multiple Nebula Appliances when the same hardware, the Open Compute Servers are in your environment?  Because, the investor convinced the Nebula Founders it was a better revenue model?

    Kemp wasn’t planning to do an appliance, he admits, but initial investor Bechtolsheim convinced him it was the right approach.

    Would you want an appliance or the software you can run on the Open Compute Server?

    BTW, given the SW runs on the Open Compute Server the Nebula Software should run on any hardware, unless Nebula modified the software to be hardware specific.