1000 Genomes, 200+TB of data available in AWS to run compute jobs

Normally when you think of running a compute project in AWS, you need to move you data and then compute.  AWS has hosted the 1000 Genome project with over 200 TB of data available to run compute jobs against without moving the data into the environment.

The 1000 Genomes Project

We're very pleased to welcome the 1000 Genomes Project data to Amazon S3.

The original human genome project was a huge undertaking. It aimed to identify every letter of our genetic code, 3 billion DNA bases in total, to help guide our understanding of human biology. The project ran for over a decade, cost billions of dollars and became the corner stone of modern genomics. The techniques and tools developed for the human genome were also put into practice in sequencing other species, from the mouse to the gorilla, from the hedgehog to the platypus. By comparing the genetic code between species, researchers can identify biologically interesting genetic regions for all species, including us.

This is a lot of data.

The data is vast (the current set weighs in at over 200Tb), so hosting the data on S3 which is closely located to the computational resources of EC2 means that anyone with an AWS account can start using it in their research, from anywhere with internet access, at any scale, whilst only paying for the compute power they need, as and when they use it. This enables researchers from laboratories of all sizes to start exploring and working with the data straight away. The Cloud BioLinux AMIs are ready to roll with the necessary tools and packages, and are a great place to get going.

Making the data available via a bucket in S3 also means that customers can crunch the information using Hadoop via Elastic MapReduce, and take advantage of the growing collection of tools for running bioinformatics job flows, such as CloudBurst and Crossbow.

It is interesting to think that AWS is hosting data that is too expensive for people to move around.

More information can be found here http://aws.amazon.com/1000genomes/

If you want to get the data yourself.  here it is

Other Sources

The 1000 Genomes project data are also freely accessible through the 1000 Genomes website, and from each of the two institutions that work together as the project Data Coordination Centre (DCC).

Is the end of Coal Power coming to the USA? EPA proposes new rules

MSNBC reports on EPA's new rules for Coal Power Plants.

End of coal power plants? EPA proposes new rules


By msnbc.com staff and news services

The Obama administration on Tuesday proposed the first-ever standards to cut carbon dioxide emissions from new power plants -- a move welcomed by environmentalists but criticized by some utilities as well as Republicans, who are expected to use it as election campaign fodder.

The difficulty for Coal Power plants is they need to meet the same emissions as natural gas plants.

While the proposed rules do not dictate which fuels a plant can burn, they would require any new coal plants essentially to halve carbon dioxide emissions to match those of plants fired by natural gas.

The pessimist view comes from the Coal industry.

Steve Miller, CEO and President of the American Coalition for Clean Coal Electricity, a group of coal-burning electricity producers, took a more dismal view, saying it "will make it impossible to build any new coal-fueled power plants and could cause the premature closure of many more coal-fueled power plants operating today."

Other opponents of the long-delayed EPA proposal say it will limit sources for electricity by making coal prohibitively expensive.

The NRDC and American Lung Association cheered the new rules.

Frances Beinecke, president of the Natural Resources Defense Council, called it a "historic step ... toward protecting the most vulnerable among us — including the elderly and our children — from smog worsened by carbon-fueled climate change."

The American Lung Association agreed. "Scientists warn that the buildup of carbon pollution will create warmer temperatures which will increase the risk of unhealthful smog levels," said board chairman Albert Rizzo. "More smog means more childhood asthma attacks and complications for those with lung disease."

Do you get your electricity from Coal?  What happens to your electricity prices in the future?

 

Using Situation Awareness Principle to Green the Data Center, Google continues the march from 1.16 to 1.14 PUE

Google posts it's latest PUE achievement of 1.14.

Measuring to improve: comprehensive, real-world data center efficiency numbers

March 26, 2012 at 9:00 AM
To paraphrase Lord Kelvin, if you don’t measure you can’t improve. Our data center operations team lives by this credo, and we take every opportunity to measure the performance of our facilities. In the same way that you might examine your electricity bill and then tweak the thermostat, we constantly track our energy consumption and use that data to make improvements to our infrastructure. As a result, our data centers use 50 percent less energy than the typical data center.
...
NewImage
...
NewImage

Google's Joe Kava uses the Lord Kelvin principle of "if you don't measure you can't improve."  But, I think a more apt explanation for the complexity of greening a data center is situation awareness.

Situation awareness

From Wikipedia, the free encyclopedia

Situation awareness is the perception of environmental elements with respect to time and/or space, the comprehension of their meaning, and the

projection of their status after some variable has changed, such as time. It is also a field of study concerned with perception of the environment

critical to decision-makers in complex, dynamic areas from aviationair traffic control, power plant operations, military command and control, and

emergency services such as fire fighting and policing; to more ordinary but nevertheless complex tasks

such as driving an automobile or bicycle.

Situation awareness involves being aware of what is happening in the vicinity to understand how information, events, and one's own actions will

impact goals and objectives, both immediately and in the near future. Lacking or inadequate situation awareness has been identified as one of

the primary factors in accidents attributed to human error.[1] Thus, situation awareness is especially important in work domains where the information

flow can be quite high and poor decisions may lead to serious consequences (e.g., piloting an airplane, functioning as a soldier, or treating critically

ill or injured patients).

Having complete, accurate and up-to-the-minute SA is essential where technological and situational complexity on the human decision-maker are

a concern. Situation awareness has been recognized as a critical, yet often elusive, foundation for successful decision-making across a broad

range of complex and dynamic systems, including aviation and air traffic control,[2] emergency response and military command and controloperations,[3]

and offshore oil and nuclear power plant management.[4]

Situation awareness vs. Lord Kelvin's principle has you thinking in the bigger picture.  Thinking about knowledge.  Am I doing the right thing?  How did I get here and can I repeat it?

Situation assessment

Endsley (1995b, p. 36) argues that "it is important to distinguish the term situation awareness, as a state of knowledge, from the processes used to achieve that state. These processes, which may vary widely among individuals and contexts, will be referred to as situation assessment or the process of achieving, acquiring, or maintaining SA." Thus, in brief, situation awareness is viewed as "a state of knowledge," andsituation assessment as "the processes" used to achieve that knowledge. Note that SA is not only produced by the processes of situation assessment, it also drives those same processes in a recurrent fashion. For example, one's current awareness can determine what one pays attention to next and how one interprets the information perceived (Endsley, 2000).

Google has shared the high level concepts of achieving a lower PUE.

1. Measure PUE

You can't manage what you don’t measure, so characterize your data center's efficiency performance by measuring energy use. We use a ratio called PUE - Power Usage Effectiveness - to help us reduce energy used for non-computing, like cooling and power distribution. To effectively use PUE it's important to measure often - we sample at least once per second. It’s even more important to capture energy data over the entire year - seasonal weather variations have a notable affect on PUE.

2. Manage airflow

Good air flow management is fundamental to efficient data center operation. Start with minimizing hot and cold air mixing by using well-designed containment. Eliminate hot spots and be sure to use blanking plates for any unpopulated slots in your rack. We've found a little analysis can pay big dividends. For example, thermal modeling using computational fluid dynamics (CFD) can help you quickly characterize and optimize air flow for your facility without many disruptive reorganizations of your computing room. Also be sure to size your cooling load to your expected IT equipment, and if you are building extra capacity, be sure your cooling approach is energy proportional

...

 

 

 

 

 

 

 

 

 

 

 

 

What does Google do to determine where it should spend its resources?  At some point there is a marginal return or a negative return.  It will cost more than what can be saved.  On the other hand at Google's scale what may be small for most can be huge for them.

Our 2011 numbers and more are available for closer examination on our data center site. We’ve learned a lot through building and operating our data centers, so we’ve also shared our best practices. These include steps like raising the temperature on the server floor and using the natural environment to cool the data center, whether it’s outside air or recycled water.

The really interesting thing to know is what has Google tried and found not to work.  As any good engineer knows many times you learn more from failures than success.

Cover Image: November 2009 Scientific American MagazineSee Inside

How You Learn More from Success Than Failure

The brain may not learn from its mistakes after all

Have you ever bowled a string of strikes that seems like it came out of nowhere? There might be more to such streaks than pure luck, according to a study that offers new clues as to how the brain learns from positive and negative experiences.

I think good engineers have learned to rewire their brain vs. others.

“Success has a much greater influence on the brain than failure,” says Massachusetts Institute of Technology neuroscientist Earl Miller, who led the research. He believes the findings apply to many aspects of daily life in which failures are left unpunished but achieve­ments are rewarded in one way or another—such as when your teammates cheer your strikes at the bowling lane. The pleasurable feeling that comes with the successes is brought about by a surge in the neurotransmitter dopamine. By telling brain cells when they have struck gold, the chemical apparently signals them to keep doing whatever they did that led to success. As for failures, Miller says, we might do well to pay more attention to them, consciously encouraging our brain to learn a little more from failure than it would by default.

 

 

 

 

 

 

 

 

 

Water shortages coming, Wars and Financial impact

MSNBC had an AP article on the coming risk of water shortages causing wars.

US intel: Water a cause for war in coming decades

'Water as a weapon or to further terrorist objectives ... more likely beyond 10 years,' says report released on World Water Day

Image: Israeli soldier stands next to a manmade pool containing water from a spring located near Ramallah
Baz Ratner /  Reuters
An Israeli soldier stands next to a manmade pool containing water from a spring located near the West Bank village of Nabi Saleh on March 19. Jewish settlers have seized dozens of natural springs in the occupied West Bank, barring Palestinians or limiting their access to scarce water sources, a United Nations report said this week. In 2009 the spring was taken over by settlers from Halamish, forcing villagers to obtain their irrigation water from other sources, the report and residents said.
By MATTHEW LEE
updated 3/22/2012 2:18:08 PM ET

Drought, floods and a lack of fresh water may cause significant global instability and conflict in the coming decades, as developing countries scramble to meet demand from exploding populations while dealing with the effects of climate change, U.S. intelligence agencies said in a report released on World Water Day.

An assessment reflecting the joint judgment of federal intelligence agencies says the risk of water issues causing wars in the next 10 years is minimal even as they create tensions within and between states and threaten to disrupt national and global food markets. But beyond 2022, it says the use of water as a weapon of war or a tool of terrorism will become more likely, particularly in South Asia, the Middle East and North Africa.

Fidelity Investments has a video on World's Water with 16,845 views in one month.

If you don't think about water issues in your data center design and operations you are not alone, but the people who think about sustainability and green data centers know water will become scarcer and more expensive.

 

James Hamilton's post on Solar Panels at Data Centers gets referenced by Forbes,

Forbes goes into depth reusing James Hamilton's post.

Can Solar Reduce The Impact Of Two High-Profile Data Centers? Amazon Engineer Weighs In [Updated]

The sun sets on the horizon across 42nd street...

Solar arrays may not be able to provide the power density needed by data centers, one expert argues. (Image credit: AFP/Getty Images via @daylife)

Solar power may be not be best way to reduce the environmental impact of sprawling data centers built by companies such as Apple and Facebook, James Hamilton, an Amazon vice president and distinguished engineer argued on his personal blog last Saturday.