Google shares production data center for compute clusters

Google Research has a post reaching out to the academic community.

Google Cluster Data

Thursday, January 07, 2010 at 1/07/2010 08:11:00 AM

Posted by Joseph L. Hellerstein, Manager of Google Performance Analytics
Google faces a large number of technical challenges in the evolution of its applications and infrastructure. In particular, as we increase the size of our compute clusters and scale the work that they process, many issues arise in how to schedule the diversity of work that runs on Google systems.

The areas of interest for Google are:

We have distilled these challenges into the following research topics that we feel are interesting to the academic community and important to Google:

  • Workload characterizations: How can we characterize Google workloads in a way that readily generates synthetic work that is representative of production workloads so that we can run stand alone benchmarks?
  • Predictive models of workload characteristics: What is normal and what is abnormal workload? Are there "signals" that can indicate problems in a time-frame that is possible for automated and/or manual responses?
  • New algorithms for machine assignment: How can we assign tasks to machines so that we make best use of machine resources, avoid excess resource contention on machines, and manage power efficiently?
  • Scalable management of cell work: How should we design the future cell management system to efficiently visualize work in cells, to aid in problem determination, and to provide automation of management tasks?

Thee Google Cluster data is here.

This project is intended for the distribution of data of production workloads running on Google clusters.

The first dataset (data-1), provides traces over a 7 hour period. The workload consists of a set of tasks, where each task runs on a single machine. Tasks consume memory and one or more cores (in fractional units). Each task belongs to a single job; a job may have multiple tasks (e.g., mappers and reducers).

The data have been anonymized in several ways: there are no task or job names, just numeric identifiers; timestamps are relative to the start of data collection; the consumption of CPU and memory is obscured using a linear transformation. However, even with these transformations of the data, researchers will be able to do workload characterizations (up to a linear transformation of the true workload) and workload generation.

The data are structured as blank separated columns. Each row reports on the execution of a single task during a five minute period.

Time (int) - time in seconds since the start of data collection

JobID (int) - Unique identifier of the job to which this task belongs

TaskID (int) - Unique identifier of the executing task

Job Type (0, 1, 2, 3) - class of job (a categorization of work)

Normalized Task Cores (float) - normalized value of the average number of cores used by the task

Normalized Task Memory (float) - normalized value of the average memory consumed by the task

Please let us know about issues you have with the data.

So far there have been 230 downloads.

Filename ▼
Summary + Labels ▼
Uploaded ▼
Size ▼
DownloadCount ▼
...

google-cluster-data-1.csv.gz
7+ hours of workload traces from a Google production cluster
Dec 18
29.8 MB
230

1 - 1 of 1

Read more

Bill Gates 2010 Annual Letter’s Environment Position is Similar to Google.org’s RE<C

Bill Gates posted his 2010 annual letter which discusses his primary focus on health and education today.  On the last/conclusion page Bill discusses his own personal interest in the environment.

Visiting an Eko financial services shop in the suburb of Uttam Nagar with my dad (Delhi, India, 2008).

Visiting an Eko financial services shop in the suburb of
Uttam Nagar with my dad (Delhi, India, 2008).

There are a lot of important topics I didn’t get around to in this letter. One area that I have been spending a lot of personal time on is energy and its effect on climate. The most important innovation required to avoid climate change will be a way of producing electricity that is cheaper than coal and that emits no greenhouse gases. There will be a huge market for this, and governments should supply large amounts of funding for basic R&D. Because the foundation invests in areas where there is not a big market, I have not yet seen a way that we can play a unique role here, but I am investing in several ideas outside the foundation. I am surprised that the climate debate hasn’t focused more on encouraging R&D since it is critical to getting to zero emissions. Still, I think it is likely that out of the many possible approaches, at least one scalable innovation will emerge in the next 20 years and be installed widely in the 20 years after that.

Note the “electricity that is cheaper than coal and that emits no greenhouse gases.”

Doesn’t this sound like Google’s RE<C project?

RE<C will work to develop electricity from renewable energy sources that is cheaper than electricity produced from coal with a goal of producing one gigawatt of renewable energy capacity – enough to power a city the size of San Francisco – in years, not decades. As part of this effort, Google.org is making strategic investments and grants, advancing key public policies, and using Google products to unlock critical information.
Renewable energy is clean, abundant, and inexhaustible. However, electricity from renewables today is generally more expensive than electricity from coal. RE<C is focused on making renewable energy cheaper than coal-fired power which today is the predominant source of electricity worldwide and a large contributor to global warming pollution.  

Google founders are behind the idea.

The decision to pursue this initiative reflects the strong commitment of Google founders Larry Page and Sergey Brin to help realize the promise of renewable energy.

Can you imagine the marketing people trying to get a Renewable Energy event where Bill Gates, Larry Page, and Sergey Brin are all in the same room?

Read more

10 Energy Fact tips from DoD’s blog on The Road to a Greener Navy

US Dept of Defense has a blog called DoDLive. On the site is a Energy Awareness section.

Category Archives: Energy Awareness

  1. The Road to a Greener Navy: 10 Facts on the Navy’s Quest for Alternative FuelsOctober 16, 2009

    Posted in DoD News, Energy Awareness.

    No comments

  2. Armed with Science: Energy Research for Force MobilityOctober 13, 2009

    Posted in Armed with Science, Energy Awareness.

    No comments

  3. Armed with Science: Algae Jet Fuel?October 8, 2009

    Posted in Armed with Science, Energy Awareness.

    2 comments

  4. Air Force Achieving Green GoalsOctober 7, 2009

    Posted in Energy Awareness.

    2 comments

  5. Armed with Science: The Nellis Air Force Base Solar ArrayOctober 6, 2009

    Posted in Armed with Science, Energy Awareness.

    No comments

  6. President Obama Declares October National Energy Awareness MonthOctober 6, 2009

    Posted in Energy Awareness.

One of the posts on a Greener Navy and its quest to us alternative fuels.

The Road to a Greener Navy: 10 Facts on the Navy’s Quest for Alternative Fuels

1.    The Department of Navy consumes 1.3 billion gallons of fuel per year and is the second largest consumer of fuel in the Department of the Defense (US Air Force is 1st, Army is 3rd).

2.    Every $10 increase in the price of a barrel of oil increases Navy fuel costs by almost $300 million.

3.    The Navy has set aggressive goals to reduce its reliance on oil, including a 10% annual increase in alternative fuels use by base support vehicles and equipment.

4.    Over 3,000 Electric and Natural Gas vehicles are currently in use on Navy bases. Electric and Natural Gas vehicles might be the most efficient land-based alternative energy solution since they require no conversion from the form in which they are produced or mined and are naturally transportable.

5.    Alternatives to petroleum-based fuel are endless. Pond scum (algae), non-food crops, biomass, wastes and CO2 are among the many energy sources currently under study.

6.    Algae fields can produce 6,000 gallons of oil per acre. A land area of 500 square miles (21.5 x 21.5 miles or 2 times the size of Washington, D.C.) could yield enough oil to meet all of the Navy’s annual fuel needs. In comparison, US oilfields currently occupy 40,000 square miles.

7.    Biofuels derived from algae and the oilseeds of the Camelina sativa plant will be used in the Navy’s “Green” Hornet and “Green” Ship initiatives.

8.    More than 200,000 gallons of algae- and camelina-based fuel will be delivered to the Navy for test and evaluation. These sources will be the first liquid alternatives to petroleum to be certified for future use.

9.    The first Navy aircraft engine to run on bio-fuel was successfully tested this month (October 2009) at the Naval Air Warfare Center Patuxent River, Md.

10.    First flight of the Navy’s F/A-18 “Green” Hornet will take flight in the spring of 2010. The camelina-based biofuel will be blended in a 50-50 mix with standard, petroleum-based JP-5 jet fuel.

Courtesy of Amy Behrman, NAVAIR Corporate Communication

Read more

4 category approach to Green the Data Center

WSJ has a guest article by Robert Plant.

— Dr. Plant is an associate professor in the department of computer information systems at the University of Miami's School of Business Administration

How Green Should My Tech Be?

To decide whether an eco-friendly IT idea makes sense, first place it in one of four categories

By ROBERT PLANT

In these tough economic times, green initiatives can be a hard sell. Companies don't want to take a gamble on pricey projects that lie outside their core mission. Yet lots of eco-friendly ideas promise to pay for themselves—and then some—by slashing costs and boosting efficiency.

The Journal Report

See the complete Business Insight report.

How should companies approach the problem? To find out, we looked at green initiatives in one critical section of businesses, the corporate data center, and placed potential projects into four categories. At one end of the spectrum are obviously useful ideas that are simple and inexpensive. At the other end are expensive distractions that should be avoided at all costs. By figuring out which category an idea fits into, companies can better weigh the risk and potential return.

The caveat that starts out is this system is dependent on the judgment by the CIO.

One caveat. This system—based on an earlier model developed in collaboration with Prof. Leslie Willcocks from the London School of Economics—relies heavily on the judgment of a company's chief information officer. We assume the CIO is closely monitoring promising technologies and can evaluate their possible impact on the business.

The four categories are.

Here are the four categories.

No-Brainers. In these cases, the green technology is a commodity. It not only cuts power use and emissions—thereby fulfilling its green mission—it's easy and cheap to obtain and implement. The bottom line: Companies should pursue these projects as soon as possible.

Promising but Pricey. Here, the green technology is clearly useful but isn't yet popular enough to be a commodity.

Business Opportunities. In some cases, green tech initiatives have the potential to win new business. One

Distractions. When evaluating green projects, the vast majority of companies shouldn't try to keep up with industry titans.

Read more

Private Clouds Dead or Alive, views from James Hamilton and Mike Manos, logic vs. emotional

I’ve been thinking about what to write as a response to James Hamilton’s blog post on Private Clouds are not the Future.  It is well written and logical in its efficiency.

Last week Alistair Croll wrote an excellent InformationWeek article arguing that “the true cloud operators will have an unavoidable cost advantage because it's all they worry about. They'll also be closer to consumers (because they have POPs everywhere and partnerships with content delivery systems), and connecting with consumers and partners will become an increasingly essential part of any enterprise IT strategy.” Have a look at Private Clouds are a Fix, Not the Future.

Private clouds are better than nothing but an investment in a private cloud is an investment in a temporary fix that will only slow the path to the final destination: shared clouds. A decision to go with a private cloud is a decision to run lower utilization levels, consume more power, be less efficient environmentally, and to run higher costs.

But  I am glad I waited, because Mike Manos posts his response to James’s posts and makes the case for private clouds. 

Private Clouds – Not just a Cost and Technology issue, Its all about trust, the family jewels, corporate value, and identity

January 24, 2010 by mmanos

I recently read a post by my good friend James Hamilton at Amazon regarding Private Clouds.   James and I worked closely together at Microsoft and he was always a good source for out of the box thinking and challenging the status quo.    While James post found here, speaks to the Private Cloud initiative being what amounts to be an evolutionary dead end, I would have to respectfully disagree.

I agree that there is more than the technical and economics benefits of shared clouds.  Human nature in trusting others and risk management are big factors in cloud computing adoption.

But this brings up one of the key criticisms that this is not just about cost and technology.   I believe what is really at stake here is much more than that.

Mike has a perspective many don’t.

In my role at Digital I have visibility into tens of data centers, across hundreds of customers that span just about every industry.  There is not, nor has there been a massive move (or any move for that matter) to become more efficient in the utilization of their resources.   We have had years of people bantering about how wonderful, cool, and how revolutionary a lot of this stuff is, but world wide Data center utilization levels have remained abysmally low.   Some providers bank on this.  Over subscription of their facilities is part of their business plan.  They know companies will lease and take down what they think they need, and never take it down in REALITY.  

and Mike Repeats a standard view that most likely many top executives have when looking at technology adoption like cloud computing.

The cloud is an interesting place, today.  It is dominated by technologists.  Extremely smart engineering people who like to optimize and solve for technological challenges.  The actual business adoption of this technology set has yet to be fully explored.   Just wait until the “Business” side of the companies get their hooks into this technology set and start placing other artificial constraints, or optimizations around other factors.  There are thousands of different motivators out in the world.  Once they starts to happen earnest.  I think what you will find is a solution that looks more like a hybrid solution than the pure plays we dream about today.

Is the Private Cloud Dead or Alive?

I vote alive.

Read more