A view of OSCON by Barton George

I wasn't able to make it to OSCON and one of the people I would have spent a lot of time with is Dell's Barton George.  I met Barton at Gartner Data Center Conference, and we frequently run into each other at other technology conferences.

Here are a few of Barton's OSCON posts.

OSCON: The Data Locker project and Singly

Who owns your data?  Hopefully the answer is you and while that may be true it is often very difficult to get your data out of sites you have uploaded it to and move it elsewhere.  Additionally, your data is scattered across a bunch of sites and locations across the web, wouldn’t it be amazing to have it all in one place and be able to mash it up and do things with it? 

OSCON: ex-NASA cloud lead on his OpenStack startup, Piston

Last week  at OSCON in Portland, I dragged Josh McKenty away from the OpenStack one-year anniversary (that’s what Josh is referring to at the very end of the interview) to do a quick video.  Josh, who headed up NASA’s Nebula tech team and has been very involved with OpenStack from the very beginning has recently announced Piston, a startup that will productize OpenStack for enterprises.

OSCON: How foursquare uses MongoDB to manage its data

I saw a great talk today here at OSCON Data up in Portland, Oregon.  The talk was Practical Data Storage: MongoDB @ foursquare and was given by foursquare‘s head of server engineering, Harry Heymann.  The talk was particularly impressive since, due to AV issues, Harry had to wing it and go slideless.  (He did post his slides to twitter so folks with access could follow along).

Facebook moves a Data Center Elephant, Dozens of Petabytes migrate to Prineville

Facebook has a post on migrating a huge Hadoop environment.  The post doesn't specifically call out the Prineville facility, but where else would they be moving to?

During the past two years, the number of shared items has grown exponentially, and the corresponding requirements for the analytics data warehouse have increased as well. As the majority of the analytics is performed with Hive, we store the data on HDFS — the Hadoop distributed file system.  In 2010, Facebook had the largest Hadoop cluster in the world, with over 20 PB of storage. By March 2011, the cluster had grown to 30 PB — that’s 3,000 times the size of the Library of Congress! At that point, we had run out of power and space to add more nodes, necessitating the move to a larger data center.

For those of you not familiar with what large data set Facebook would be moving.

y Paul Yang on Wednesday, July 27, 2011 at 9:19am

Users share billions of pieces of content daily on Facebook, and it’s the data infrastructure team's job to analyze that data so we can present it to those users and their friends in the quickest and most relevant manner. This requires a lot of infrastructure and supporting data, so much so that we need to move that data periodically to ever larger data centers. Just last month, the data infrastructure team finished our largest data migration ever – moving dozens of petabytes of data from one data center to another.

The post has lots of details and ends with a pitch to join the Facebook infrastructure team.

The next set of challenges for us include providing an ability to support a data warehouse that is distributed across multiple data centers. If you're interested in working on these and other "petascale" problems related to Hadoop, Hive, or just large systems, come join Facebook's data infrastructure team!

The data infrastructure team in the war room during the final switchover.

Curious I went to see what are the current job posts in the tech operations team.

Open Positions
Production Operations: Systems, Network, Storage, Database (14)

    Supply Chain, Program Management and Analysis (6)

    Hardware Design and Data Center Operations (12)

     

    RAW vs. JPG, 25% images are now RAW

    10 years ago at Microsoft, four of us had this idea that RAW imaging would be big.  I wrote a blog post with some of the history.

    Story of Adobe & Apple High-Value Digital Image Applications, Adobe’s angst developing for the iPad, and how Microsoft missed this battle

    MONDAY, MAY 17, 2010 AT 3:25AM

    This is not a data center post, but one about competition and innovation.

    If you are a high-end photographer person you use the RAW imaging format, a higher quality image format vs. JPEG.

    A camera raw image file contains minimally processed data from the image sensor of either a digital camera, image, or motion picture film scanner. Raw files are so named because they are not yet processed and therefore are not ready to be printed or edited with a bitmap graphics editor.

    Microsoft just released a RAW image CODEC for Windows 7 and Vista, and what was interesting is the analysis says 25% of images are RAW.

    Photo Gallery now supports raw format

    by Brad Weed

    We all take a lot of photos. In fact, according to data provided by InfoTrends, more than 73 billion still images were shot in the US alone in 2010. If you’re lucky enough to own a DSLR (digital single lens reflex) camera, you’re likely to take two and a half times as many photos in a given month as your friends with point-and-shoot cameras. That’s a lot of photos. What’s more, nearly a quarter of those photos are taken in a raw image format.

    The group of 4 that had the original RAW image idea 10 years ago are no longer at Microsoft.  One is a Google executive, one is an Adobe executive, one is an imaging consultant, and myself.

    At least now, we finally have the data to say how big the RAW imaging format is.  25% of the market.  Now the market is big enough, and a product can be developed.  But, it is a little too late to try and come to market now.

    MapR, 1/2 the HW and faster performance for Apache Hadoop

    MapR technologies came out of stealth mode in June, and their solution is available for download.

    image

    MapR is the Next Generation for Apache Hadoop

    Here is a presentation that you can watch to learn more.

    The Design, Scale and Performance of MapR’s Distribution for Apache Hadoop

    Posted on JULY 27, 2011 by JACK

    Check out M.C. Srivas’ Hadoop Summit presentation. Srivas, the CTO and co-founder of MapR, outlines the architectural details behind MapR’s performance advantages. This technical discussion also describes the scale advantages of the MapR distributed NameNode and provides comparisons to HDFS.

    ← Big Data and Hadoop

    Pretty cool you can save 1/2 power for your Apache Hadoop system.  Software can save a lot of power to support a green data center.

    IP Network Discovery as a way to manage Data Center Power, joulex

    I had a chance to talk Tom Noonan, President & CEO and Tim McCormick with joulex to discuss their data center power management solution.  What caught my eye is Tom's ME background and experience in systems engineering and real-time process control systems.

    Tom Noonan
    President & CEO

    Tom Noonan assumed the role of president and CEO at JouleX in 2010 and also remains a partner at TechOperators, an early-stage investing firm he co-founded in 2008. He is the former chair, president and CEO of Internet Security Systems, which was acquired by IBM for $1.5 billion. Prior to ISS, Noonan held senior positions at Dun and Bradstreet Software, where he was vice president, worldwide marketing.  

    After graduating from Georgia Tech with a Mechanical Engineering degree, Noonan joined Rockwell Automation as a systems engineer specializing in real-time process control systems for industrial automation applications. Noonan founded two successful control systems technology companies while residing in Boston: Actuation Electronics, a precision motion-control company and Leapfrog Technologies, a software development environment for real time process control and automation applications.

    ...

    Tim McCormick
    Vice President, Sales & Marketing

    Tim McCormick brings over 25 years of marketing, sales and business development experience in both enterprise security and application software. Prior toJouleX, he was vice president of the Business Solutions Group at IBM Internet Security Systems. He also served as vice president of marketing for Lancope, a leading network behavior analysis and anomaly detection provider, and at ClickFox, a customer behavior intelligence solution provider.

    One of the things that impressed me is JouleX uses an IP discovery strategy that allows an agentless approach to discover the inventory of power devices in the data center.  Note the Routers and Switches which are in the center of this diagram.

    image

    image

    Working with other systems that has information about IP devices makes the discovery easier by communicating with devices that manage other devices.

    image

    This approach allows JouleX to create graphs like this on where the power is being used based on the IP addresses inventoried.

    image