Do you have an Elephant and Pig in your data center? Hadoop momentum continues

I am sure most of your have heard of Hadoop.

I've started studying Hadoop and its adoption in data centers.  Google started the effort with its MapReduce and Google File System.

Apache Hadoop is a software framework that supports data-intensive distributed applications under a free license.[1] It enables applications to work with thousands of nodes and petabytes of data. Hadoop was inspired by Google's MapReduce and Google File System (GFS) papers.

Why should you care about Hadoop? Look at who the users are - Amazon Web Services, Adobe, AOL, Baidu, eBay, Facebook, Google, Hulu, IBM, LinkedIn, Quantcast, Rackspace, Twitter, and Yahoo.

Yahoo! is proud of being the largest Hadoop user.  Here is their 2009 #'s 25,000 nodes.

image

And, 2010 38,000 servers for 170 PB of storage

image

Apache Pig is a platform for analyzing the large data set.

Pig

Apache Pig is a platform for analyzing large data sets that consists of a high-level language for expressing data analysis programs, coupled with infrastructure for evaluating these programs. The salient property of Pig programs is that their structure is amenable to substantial parallelization, which in turns enables them to handle very large data sets.

At the present time, Pig's infrastructure layer consists of a compiler that produces sequences of Map-Reduce programs, for which large-scale parallel implementations already exist (e.g., the Hadoop subproject). Pig's language layer currently consists of a textual language called Pig Latin, which has the following key properties:

Read more

Google has the most Internet Traffic and Data Centers and Servers

Arbor Networks reports on Google’s network traffic.

Google Sets New Internet Traffic Record

by Craig Labovitz

In their earnings call last week, Google announced a record 2010 third-quarter revenue of $7.29 billion (up 23% from last year). The market rejoiced and Google shares shot past $615 giving the company a market cap of more than $195 billion.

This month, Google broke an equally impressive Internet traffic record — gaining more than 1% of all Internet traffic share since January. If Google were an ISP, as of this month it would rank as the second largest carrier on the planet.

Only one global tier1 provider still carries more traffic than Google (and this ISP also provides a large portion of Google’s transit).

In the graph below, I show a weighted average percentage of Internet traffic contributed by the search / mobile OS / video / cloud giant. As in earlier posts, the Google data comes from 110+ ISPs around the world participating in ATLAS. The multiple shaded colors represent different Google ASN and reflect ongoing global traffic engineering strategies.

googletraffic

If you count caching they are even bigger.

Google now represents an average 6.4% of all Internet traffic around the world. This number grows even larger (to as much as 8-12%) if I include estimates of traffic offloaded by the increasingly common Google Global Cache (GGC) deployments and error in our data due to the extremely high degree of Google edge peering with consumer networks.

Google has more traffic, more data centers and servers than anyone else.

How high can Google go?

Read more

MacRumors speculates on Apple’s Data Center

MacRumors speculates on what Apple’s future data center plans are.


Apple's NC Data Center Plot Larger Than Originally Thought

Wednesday October 27, 2010 10:19 AM EST
Written by Eric Slivka

Ongoing investigations over at All Things Digital have revealed that Apple's new data center that is set to open "any day now" in Maiden, North Carolina may be the site of even grander plans than the potential doubling in size discovered late last week. According to that earlier research, Apple's initial proposal to representatives of Catawba County where the project is located included a schematic showing two adjacent data centers that would appear to total on the order of one million square feet, with only one of those buildings having been constructed so far.


Apple's 70-acre parcel across Startown Road from existing data center

New research from All Things Digital indicates, however, that Apple's plans may even extend beyond that planned one million square-foot facility on 183 acres, as the company also owns 70 acres across the street from that site.

The scuttlebutt around Maiden is that the company intends to use it for office space. But that seems unlikely.
A more plausible explanation is that this parcel, too, will be used for data center space.

Read more

Facebook posts on its Data Center Efficiency Project

Facebook data center engineering’s Jay Park posts on what Facebook presented at SVLG Data Center Efficiency Summit.

Optimizing Data Center Energy Usage

by Jay Park on Wednesday, October 20, 2010 at 8:32am

When it comes to optimizing data centers for energy usage, the minutest changes can have significant impact. Facebook’s growth over the years has expanded our data center footprint greatly, and we've learned many lessons and applied some of the industry’s best practices to make our data centers much more efficient, saving us money and using less energy. At the Silicon Valley Leadership Group's Data Center Efficiency Summit last week, we shared these lessons and the new strategies we've implemented with the data center community at large so they too can utilize these techniques, multiplying the energy savings and environmental protection across the infrastructure of many other companies.  

Based on this graph

 

A 9% improvement in IT load for a 276 KW savings means the IT critical capacity was 3 megawatts.  Assuming low power servers with around 6,000 servers per megawatt, the servers in the environment are 18,000.

Jay discussed saving 3 watts per server.

We discovered that the server fans were spinning faster than necessary, so we worked with the server manufacturers to optimize their fan speed control algorithm while keeping temperatures within the recommended range. For each server, this saves up to 3 watts and requires less air (up to 8 cubic feet per minute), which quickly adds up in a 56,000 square foot facility.

3 watts per server is 54,000 watts.  With 56,000 sq ft and 3 MW of power, the power is only 50 watts per sq ft which fits with this low density image below.  Note the amount of open space.

The inlet temperature is not mentioned in the post, but I recall that Jay said 68 to 72 degrees which fits with the raise in return temperature.

In the end, we raised the temperature for each CRAH unit's return air to 81 degrees Fahrenheit from 72 degrees Fahrenheit.

The group I was sitting with during Facebook’s presentation wasn’t overly impressed, but with 50 watts sq, ft, 3 megawatt IT load, leasing a facility (not owning), the Facebook engineering group most likely had a very short ROI payback, and wanted to keep their capital investment to a minimum.

Read more

Ray Ozzie posts on Dawn of a New Day, Continuous Services and Connected Devices

Ray Ozzie has started a new blog and posts on Dawn of a New Day.

Dawn of a New Day

To:           Executive Staff and direct reports
Date:         October 28, 2010
From:         Ray Ozzie
Subject:      Dawn of a New Day

Five years ago, having only recently arrived at the company, I wrote The Internet Services Disruption in order to kick off a major change management process across the company.  In the opening section of that memo, I noted that about every five years our industry experiences what appears to be an inflection point that results in great turbulence and change.


Ray finds information about 25 years on Nov 20 1985.

Imagining A “Post-PC” World

One particular day next month, November 20th 2010, represents a significant milestone.  Those of us in the PC industry who placed an early bet on a then-nascent PC graphical UI will toast that day as being the 25thanniversary of the launch of Windows 1.0.


25 years ago I was working at Apple.  Wow look at where Apple is after 25 years and where Microsoft is.  In 1992 I moved from Apple to Microsoft.

image

From 1985 to 1992 here is Apple vs. Microsoft stock.

image

But what are the last 5 years like as Ray is infamous for his e-mail waking up Microsoft.

image

Ray argues for simplicity

Complexity kills. Complexity sucks the life out of users, developers and IT.  Complexity makes products difficult to plan, build, test and use.  Complexity introduces security challenges.  Complexity causes administrator frustration.

And Data Center Services he calls Continuous Services

Continuous services are websites and cloud-based agents that we can rely on for more and more of what we do.  On the back end, they possess attributes enabled by our newfound world of cloud computing: They’re always-available and are capable of unbounded scale.  They’re constantly assimilating & analyzing data from both our real and online worlds.  They’re constantly being refined & improved based on what works, and what doesn’t.  By bringing us all together in new ways, they constantly reshape the social fabric underlying our society, organizations and lives.  From news & entertainment, to transportation, to commerce, to customer service, we and our businesses and governments are being transformed by this new world of services that we rely on to operate flawlessly, 7×24, behind the scenes.

And future are appliance devices.

But there’s one key difference in tomorrow’s devices: they’re relatively simple and fundamentally appliance-likeby design, from birth.  They’re instantly usable, interchangeable, and trivially replaceable without loss.  But being appliance-like doesn’t mean that they’re not also quite capable in terms of storage; rather, it just means that storage has shifted to being more cloud-centric than device-centric.  A world of content – both personal and published – is streamed, cached or synchronized with a world of cloud-based continuous services.

Ray’s vision is centered around always on data center services with a range of simple appliances to connect to the services.

Who wants to go back to a time when editing win.ini or Mac ResEdit?

Ray paints an interesting future where Google, Microsoft, and Apple will compete for Continuous Services and Connected Devices.

Read more