Google Ads

Enter your email address:

Delivered by FeedBurner

This form does not yet contain any fields.

    The Story of Cap & Trade videos – Annie Leonard and Warren Buffett

    Annie Leonard has a new video on the story of cap and trade. - The Story of Cap & Trade is a fast-paced, fact-filled look at the leading climate solution being discussed at Copenhagen and on Capitol Hill. Host Annie Leonard introduces the energy traders and Wall Street financiers at the heart of this scheme and reveals the "devils in the details" in current cap and trade proposals: free permits to big polluters, fake offsets and distraction from whats really required to tackle the climate crisis. If youve heard about Cap & Trade, but arent sure how it works (or who benefits), this is the film is for you.

    And, here is another from Warren Buffett who discusses cap and trade as a regressive tax.

    Click to read more ...


    ex-Intel engineers at Microsoft share processor secrets, optimize performance per watt

    Microsoft’s Dileep Bhandarkar and Kushagra Vaid published a paper on Rightsizing Servers for cost and power savings which are important in a green data center strategy.  To put things in context both Dileep and Kushagra are ex-Intel processor engineers.  Let’s start with the summary from their paper

    In conclusion, the first point to emphasize is that there is more to performance than just speed. When your definition of performance includes cost effectiveness, you also need to consider power. The next point is that in many cases processor speed has outpaced our ability to consume it. It’s difficult to exploit CPU performance across the board. This platform imbalance presents an opportunity to rightsize your configurations. The results will offer a reduction in both power and costs, with power becoming an increasingly important factor in the focus on total cost of ownership.

    It is also important to remember that industry benchmarks may not reflect your environment. We strongly recommend that IT departments do their own workload characterization, understand the behavior of the applications in their own world, and then optimize for that.

    Dileep and Kushagra are going out on a limb sharing details most wouldn’t.  Intel and server manufacturers goal is to maximize revenue per unit (chips or servers).  If you buy high performance chips in the belief you are buying  high performance per watt systems, then they’ll make more money.  But, the truth is many times you don’t need the high performance processors.  There are many server manufacturers who are selling to big data center companies high performance per watt systems that have low cost processors.

    Dileep has a blog post that goes along with the paper.

    Before I came to Microsoft to manage server definition and purchases I worked on the other side of the fence. For 17 years I focused on processor architecture and performance at Digital Equipment Corporation, and then worked for 12 years at Intel, focusing on performance, architecture, and strategic planning. It’s interesting how now that I’m a hardware customer, the word “performance” encompasses cost effectiveness almost as much as it does throughput and response time. As my colleague Kushagra Vaid and I point out in our paper, when you look up performance in the dictionary it is defined as “how well something performs the functions for which it’s intended”.

    Why should you read this paper? Because as Dileep points out the vast majority of people are purchasing based on unrealistic configurations run under processor benchmarks.

    Figure: Three-year total cost of ownership of a basic 1U server

    It also surprises me that so many IT groups base their purchasing decisions on published benchmark data about processors, even though that data is often generated using system configurations that are completely unrealistic when compared to real-world environments. Most folks sit up and take note when I display the facts about these topics, because the subject is important.

    Rightsizing can clearly reduce the purchase price and the power consumption of a server. But the benefits go beyond the savings in capital expenditure. The lower power consumption has a big impact on the Total Cost of Ownership as shown in the Figure.

    So, let’s start diving into the secrets in Dileep and Kushagra’s paper.  Here is the background.

    How do you make sure that the servers you purchase and deploy are most efficient in terms of cost and energy? In the Microsoft Global Foundation Services organization (GFS)—which builds and manages the company’s datacenters that house tens of thousands of servers—we do this by first performing detailed analysis of our internal workloads. Then by implementing a formal analysis process to rightsize the servers we deploy an immediate and long term cost savings can be realized. GFS finds that testing on actual internal workloads leads to much more useful comparison data versus published benchmark data. In rightsizing our servers we balance systems to achieve substantial savings. Our analysis and experience shows that it usually makes more sense to use fewer and less expensive processors because the bottleneck in performance is almost invariably the disk I/O portion of the platform, not the CPU.

    What benchmarks?  SPEC CPU2006.  Understand the conditions of the test.

    One of the most commonly used benchmarks is SPEC CPU2006. It provides valuable insight into performance characteristics for different microprocessors central processing units (CPUs) running a standardized set of single-threaded integer and floating-point benchmarks. A multi-threaded version of the benchmark is CPU2006_rate, which provides insight into throughput characteristics using multiple running instances of the CPU2006 benchmark.

    But important caveats need to be considered when interpreting the data provided by the CPU2006 benchmark suite. Published benchmark results are almost always obtained using very highly tuned compilers that are rarely if ever used in code development for production systems. They often include settings for code optimization switches uncommon in most production systems. Also, while the individual benchmarks that make up the CPU2006 suite represent a very useful and diverse set of applications, these are not necessarily representative of the applications running in customer production environments. Additionally, it is very important to consider the specifics of the system setup used for obtaining the benchmarking data (e.g., CPU frequency and cache size, memory capacity, etc.) while interpreting the benchmark results since the setup has an impact on results and needs to be understood before making comparisons for product selection.

    and TPC.

    Additionally, the system configuration is often highly tuned to ensure there are no performance bottlenecks. This typically means using an extremely high performing storage subsystem to keep up with the CPU subsystem. In fact, it is not uncommon to observe system configurations with 1,000 or more disk drives in the storage subsystem for breakthrough TPC-C or TPC-E results. To illustrate this point, a recent real-world example involves a TPC-C
    4 | Rightsizing Servers to Achieve Cost and Power Savings in the Datacenter Published December 2009 result for a dual-processor server platform that has an entry level price a little over $3,000 (Source: The result from the published benchmark is impressive: more than 600,000 transactions per minute. But the total system cost is over $675,000. That’s not a very realistic configuration for most companies. Most of the expense comes from employing 144 GB of memory and over a thousand disk drives.

    Both of these test are in general setup to show the performance of CPUs, but as Dileep and Kushagra say, few systems are used in these configurations.  So what do you do?  Rightsize the system which usually means don’t buy the high performing CPU.  As the CPU is not the bottleneck.  Keep in mind these are ex-Intel processor engineers.

    CPU is typically not your bottleneck: Balance your systems accordingly
    So how should you look at performance in the real world? First you need to consider what the typical user configuration is in your organization. Normally this will be dictated either by the capability or by cost constraints. Typically your memory sizes are smaller than what you see in published benchmarks, and you have a limited amount of disk I/O. This is why CPU utilization throughout the industry is very low: server systems are not well balanced. What can you do about it? One option is to use more memory so there are fewer disk accesses. This adds a bit of cost, but can help you improve performance. The other option—the one GFS likes to use—is to deploy balanced servers so that major platform resources (CPU, memory, disk, and network) are sized correctly.

    So, what happens if you don’t rightsize?

    If memory or disk bandwidth is under-provisioned for a given application, the CPU will remain idle for a significant amount of time, wasting system power. The problem gets worse with multicore CPUs on the technology roadmap, offering further increases in CPU pipeline processing capabilities. A common technique to mitigate this mismatch is to increase the amount of system memory to reduce the frequency of disk accesses.

    The old rule was to buy the highest performing processors i could afford.  Why not?  Because it wastes money and increases your power costs.

    Another aspect to consider is shown in Figure 2 below. If you look at performance as measured by frequency for any given processor, typically there is a non-linear effect. At the higher frequency range, the price goes up faster than the frequency. To make matters worse, performance does not typically scale linearly with frequency. If you’re aiming for the highest possible performance, you’re going to end up paying a premium that’s out of proportion with the performance you’re going to get. Do you really need that performance, and is the rest of your system really going to be able to use it? It’s very important from a cost perspective to find the sweet spot you’re after.


    What is the relationship of system performance, CPU utilization and disks?

    See Figure 5 on the next page shows CPU utilization increasing with disk count as the result of the system being disk limited. As you increase the number of disk drives, the number of transactions per second goes up because you’re getting more I/O and consequently more throughput. With only eight drives CPU utilization is just 5 percent. At 24 drives CPU utilization goes up to 20 percent. If you double the drives even more, utilization goes up to about 25 percent. What that says is that you’re disk I/O limited, so you don’t need to buy the most expensive, fastest processor. This kind of data allows us to rightsize the configuration, reducing both power and cost.


    The paper goes on to discuss Web Servers where if content is cached a faster processor does help.


    To share the blame, two RAID controllers are looked at one with 256 MB and another with 512MB of cache.

    But when we looked at the results from our ETW workload analysis, we found that most of the time our queue depth never goes beyond 8 I/Os. So in our operational area, there is no difference in performance between the two RAID controllers. If we didn’t have the workload analysis and just looked at those curves, we might have been impressed by the 10-15 percent performance improvement at the high end of the scale, and paid a premium for performance we would never have used.


    Click to read more ...


    Economist article on IT systems effect on the financial crisis

    Economist has an article on the relationship of IT system and the financial crises.

    The article starts by pointing out financial services spends $500 billion globally annually on IT, according to Gartner.

    Banks and information technology

    Silo but deadly

    Dec 3rd 2009
    From The Economist print edition

    Messy IT systems are a neglected aspect of the financial crisis

    NO INDUSTRY spends more on information technology (IT) than financial services: about $500 billion globally, more than a fifth of the total (see chart). Many of the world’s computers, networking and storage systems live in the huge data centres run by banks. “Banks are essentially technology firms,” says Hugo Banziger, chief risk officer at Deutsche Bank. Yet the role of IT in the crisis is barely discussed.

    The point of the article is the silos of IT made it difficult to see the overall risk.

    This fragmented IT landscape made it exceedingly difficult to track a bank’s overall risk exposure before and during the crisis. Mainly as a result of the Basel 2 capital accords, many banks had put in new systems to calculate their aggregate exposure. Royal Bank of Scotland (RBS) spent more than $100m to comply with Basel 2. But in most cases the aggregate risk was only calculated once a day and some figures were not worth the pixels they were made of.

    During the turmoil many banks had to carry out big fact-finding missions to see where they stood. “Answering such questions as ‘What is my exposure to this counterparty?’ should take minutes. But it often took hours, if not days,” says Peyman Mestchian, managing partner at Chartis Research, an advisory firm. Insiders at Lehman Brothers say its European arm lacked an integrated picture of its risk position in the days running up to its demise.

    But is IT really the cause or its the people who refuse to work with other groups?  IT has grows so large because users want to own the data systems, as information is power.   As the economist points out the problem was discovery of issues across systems.

    During the turmoil many banks had to carry out big fact-finding missions to see where they stood. “Answering such questions as ‘What is my exposure to this counterparty?’ should take minutes. But it often took hours, if not days,” says Peyman Mestchian, managing partner at Chartis Research, an advisory firm. Insiders at Lehman Brothers say its European arm lacked an integrated picture of its risk position in the days running up to its demise.

    Due to the power of IT industry, people focus on going faster.

    But many other banks are still in firefighting mode, says Mr Mestchian. Much of the money invested in IT still goes into making things faster rather than more transparent.

    The change needed in IT is to think more about transparency of their systems and how they work with other systems.  This is will happen as social software systems permeate more of IT.  The old term was collaboration, now it is is social software/networking.

    Imagine if twitter and facebook worked in a financial systems IT systems.  Could you discover issues faster?

    Click to read more ...


    Amazon delivers elastic cloud computing pricing driving creative destruction of IT business models

    I am at first hesitant to write another Amazon Web Services post as I haven written so many Amazon posts lately, but AWS’s latest announcement of spot pricing will drive changes at multiple levels.

    What AWS spot pricing has done is simple.  You can now bid for EC2 spot instances in a spot market way  for AWS capacity.

    Amazon EC2 Spot Instances

    Spot Instances are a new way to purchase and consume Amazon EC2 Instances. They allow customers to bid on unused Amazon EC2 capacity and run those instances for as long as their bid exceeds the current Spot Price. The Spot Price changes periodically based on supply and demand, and customers whose bids meet or exceed it gain access to the available Spot Instances. Spot Instances are complementary to On-Demand Instances and Reserved Instances, providing another option for obtaining compute capacity.

    Amazon CTO Werner Vogels summarizes the significance.

    Spot instances are a great innovation that, as far as I know, has no equivalent in the IT industry. It brings our customers a powerful new way of managing the cost for those workloads that are flexible in their execution and completion times. This new customer-managed pricing approach holds the power to make new areas of computing feasible for which the economics were previously unfavorable.

    Why is this significant?  Nicholas Carr explains.

    AWS: the new Chicago Edison

    DECEMBER 14, 2009

    The key to running a successful large-scale utility is to match capacity (ie, capital) to demand, and the key to matching capacity to demand is to manipulate demand through pricing. The worst thing for a utility, particularly in the early stages of its growth, is to have unused capacity. At the end of the nineteenth century, Samuel Insull, president of the then-tiny Chicago Edison, started the electric utility revolution when he had the counterintuitive realization that to make more money his company had to cut its prices drastically, at least for those customers whose patterns of electricity use would help the utility maximize its capacity utilization.

    Amazon Web Services is emerging as the Chicago Edison of utility computing. Perhaps because its background in retailing gives it a different perspective than that of traditional IT vendors, it has left those vendors in the dust when it comes to pioneering the new network-based model of supplying computing and storage capacity.

    Besides the economic benefits what this means is there is now a financial incentive to re-architect applications to be efficiently turned on and off.  These questions are normally not asked at the Enterprise Architect level, but the AWS user base now will.  

    Architecting Applications to Use Spot Instances

    There are a number of best practices to keep in mind when making use of Spot Instances:

    Save Your Work Frequently: Because Spot Instances can be terminated without warning, it is important to build your applications in a way that allows you to make progress even if your application is interrupted. There are many ways to accomplish this, two of which include adding checkpoints to your application and splitting your work into small increments. Using Amazon EBS volumes to store your data is one easy way to protect your data.

    Test Your Application: When using Spot Instances, it is important to make sure that your application is fault tolerant and will correctly handle interruptions. While we attempt to cleanly terminate your instances, your application should be prepared to deal with an immediate shutdown. You can test your application by running an On-Demand Instance and then terminating it suddenly. This can help you to determine whether or not your application is sufficiently fault tolerant and is able to handle unexpected interruptions.

    Track when Spot Instances Start and Terminate: The simplest way to know the current status of your Spot Instances is to monitor your Spot requests and running instances via the AWS Management Console or AmazonEC2 API.

    Choose a Maximum Price for Your Request: Remember that the maximum price that you submit as part of your request is not necessarily what you will pay per hour, but is rather the maximum you would be willing to pay to keep it running. You should set a maximum price for your request that is high enough to provide whatever probability you would like that your instances run for the amount of time that you desire within a given timeframe. Use the Spot Price history via the AWS Management Console or the Amazon EC2 API to help you set a maximum price.

    An example of using spot instances is Pfizer’s Protein Engineering group architecting their AWS app to have “must do” and “like to do”

    The Protein Engineering group at Pfizer has been using AWS to model Antibody-Antigen interactions using a protein docking system. Their protocol utilizes a full stack of services including EC2, S3, SQS, SimpleDB and EC2 Spot instances (more info can be found in a recent article by BioTeam's Adam Kraut, a primary contributor to the implementation). BioTeam described this system as follows:

    The most computationally intensive aspect of the protocol is an all-atom refinement of the docked complex resulting in more accurate models. This exploration of the solution space can require thousands of EC2 instances for several hours.

    Here's what they do:

    We have modified our pipeline to submit "must do" refinement jobs on standard EC2 instances and "nice to do" workloads to the Spot Instances. With large numbers of standard instances we want to optimize the time to complete the job. With the addition of Spot Instances to our infrastructure we can optimize for the price to complete jobs and cluster the results that we get back from spot. Not unlike volunteer computing efforts such as Rosetta@Home, we load the queue with tasks and then make decisions after we get back enough work units from the spot instances. If we're too low on the Spot bids we just explore less solution space. The more Spot Instances we acquire the more of the energy landscape we can explore.

    Here is their architecture:

    Going back to Werner Vogels blog post, Cloud computing has three different purchasing models.

    Different Purchasing Models

    The three different purchasing models Amazon EC2 offers give customers maximum flexibility in managing their IT costs; On-Demand Instances are charged by the hour at a fixed rate with no commitment; with Reserved Instances you pay a low, one-time fee and in turn receive a significant discount on the hourly usage charge for that instance; and Spot Instances provide the ability to assign the maximum price you want for capacity with flexible start and end times.

    • On-Demand Instances - On-Demand Instances let you pay for compute capacity by the hour with no long-term commitments or upfront payments. You can increase or decrease your compute capacity depending on the demands of your application and only pay the specified hourly rate for the instances you use. These instances are used mostly for short term workloads and for workloads with unpredictable resource demand characteristics.
    • Reserved Instances - Reserved Instances let you make a low, one-time, upfront payment for an instance, reserve it for a one or three year term, and pay a significantly lower rate for each hour you run that instance. You are assured that your Reserved Instance will always be available in the Availability Zone in which you purchased it. These instances are used for longer running workloads with predictable resource demands.
    • Spot Instances - Spot Instances allow you to specify the maximum hourly price that you are willing to pay to run a particular instance type. We set a Spot Price for each instance type in each region, which is the price all customers will pay to run a Spot Instance for that given hour. The Spot Price fluctuates based on supply and demand for instances, but customers will never pay more than the maximum price they have specified. These instances are used for workloads with flexible completion times.

    What’s next for AWS?  Users are asking for sub hour increments.  It makes sense if you continue down the path of spot market pricing and the ability to maximize utilization.

    This is awesome. Market pricing for computer power. People have dreamed of this and now Amazon is making it happen!

    Now the real question is when will AWS start charging for half hours or quarter hours?

    I have projects I need to run every hour for only 15 to 20 minutes ... but they need to run every hour.

    Click to read more ...


    Early indicator of Google Data Center growth? $400 SE Asia Japan cable project

    Guardian UK reports.on the announcement.

    Google backs world's fastest internet cable

    • Undersea line set to run 5,000 miles across southeast Asia
    • £245m cable marks latest investment in net infrastructure

    In little more than a decade, Google has conquered the technology industry and become one of the world's most powerful companies. Its latest undertaking, however, may be one of its most ambitious: a giant undersea cable that will significantly speed up internet access around the globe.

    The Californian search engine is part of a consortium that confirmed its plans to install the new Southeast Asia Japan Cable (SJC) yesterday, the centrepiece of a $400m (£245m) project that will create the highest capacity system ever built.

    Gigaom references the 2008 SJC proposal.

    Google’s Underwater Ambitions Expand

    By Stacey Higginbotham December 11, 2009 1 Comment

    0 45

    The original SJC proposal

    Click to read more ...