NYTimes AWS Cloud Computing Mistake Cost $240

Nicholas Carr discusses what is possible with Cloud Computing.

The new economics of computing

November 05, 2008

Are we missing the point about cloud computing?

That question has been rattling around in my mind for the last few days, as the chatter about the role of the cloud in business IT has intensified. The discussion to date has largely had a retrospective cast, focusing on the costs and benefits of shifting existing IT functions and operations from in-house data centers into the cloud. How can the cloud absorb what we're already doing? is the question that's being asked, and answering it means grappling with such fraught issues as security, reliability, interoperability, and so forth. To be sure, this is an important discussion, but I fear it obscures a bigger and ultimately more interesting question: What does the cloud allow us to do that we couldn't do before?

The history of computing has been a history of falling prices (and consequently expanding uses). But the arrival of cloud computing - which transforms computer processing, data storage, and software applications into utilities served up by central plants - marks a fundamental change in the economics of computing. It pushes down the price and expands the availability of computing in a way that effectively removes, or at least radically diminishes, capacity constraints on users. A PC suddenly becomes a terminal through which you can access and manipulate a mammoth computer that literally expands to meet your needs. What used to be hard or even impossible suddenly becomes easy.

His example is the NYTimes.

My favorite example, which is about a year old now, is both simple and revealing. In late 2007, the New York Times faced a challenge. It wanted to make available over the web its entire archive of articles, 11 million in all, dating back to 1851. It had already scanned all the articles, producing a huge, four-terabyte pile of images in TIFF format. But because TIFFs are poorly suited to online distribution, and because a single article often comprised many TIFFs, the Times needed to translate that four-terabyte pile of TIFFs into more web-friendly PDF files. That's not a particularly complicated computing chore, but it's a large computing chore, requiring a whole lot of computer processing time.

Fortunately, a software programmer at the Times, Derek Gottfrid, had been playing around with Amazon Web Services for a number of months, and he realized that Amazon's new computing utility, Elastic Compute Cloud (EC2), might offer a solution. Working alone, he uploaded the four terabytes of TIFF data into Amazon's Simple Storage System (S3) utility, and he hacked together some code for EC2 that would, as he later described in a blog post, "pull all the parts that make up an article out of S3, generate a PDF from them and store the PDF back in S3." He then rented 100 virtual computers through EC2 and ran the data through them. In less than 24 hours, he had his 11,000 PDFs, all stored neatly in S3 and ready to be served up to visitors to the Times site.

The total cost for the computing job? Gottfrid told me that the entire EC2 bill came to $240. (That's 10 cents per computer-hour times 100 computers times 24 hours; there were no bandwidth charges since all the data transfers took place within Amazon's system - from S3 to EC2 and back.)

An interesting part of the story on the original blog which was confirmed by an Amazon Web Services evangelist is the NYTimes had to run the cloud computing job twice, as the first had errors.

I then began some rough calculations and determined that if I used only four machines, it could take some time to generate all 11 million article PDFs. But thanks to the swell people at Amazon, I got access to a few more machines and churned through all 11 million articles in just under 24 hours using 100 EC2 instances, and generated another 1.5TB of data to store in S3. (In fact, it work so well that we ran it twice, since after we were done we noticed an error in the PDFs.)

So, it cost the NYTimes an extra $240 for their mistake as they had to run the job a second time.