Oops, you mean Big Data does not perform magic

I don’t know about you, but I have learned to be cautious about grand claims of what can be done with big data.  Expecting a bunch of big data science people to be like magicians turning data into gold sounds good, but notice that those who tell the big data stories many times have something to sell.

Here is an arstechnica story on the Big Data Hubris.

Put another way, it's not uncommon to hear the argument that "computer algorithms have reached the point where we can now do X." Which is fine in and of itself, except, as the authors put it, it's often accompanied by an implicit assumption: "therefore, we no longer have to do Y." And Y, in these cases, was the scientific grunt work involved with showing a given correlation is relevant, general, driven by a mechanism we can define, and so forth.

And the reality is that the grunt work is so hard that a lot of it is never going to get done. It's relatively easy to use a computer to pick out thousands of potentially significant differences between the human and mouse genomes. Testing the actual relevance of any one of those could occupy a grad student for a couple of years and cost tens of thousands of dollars. Because of this dynamic, a lot of the insights generated using big data will remain stuck in the realm of uncertainty indefinitely.

Recognizing this is probably the surest antidote to the problem of big data hubris. And it might help us think more clearly about the sorts of big data work that are most likely to make a lasting scientific impact.

Wow AWS is 8 years old

GigaOm’s Barb Darrow posted on AWS turning 8.

 

Amazon’s ginormous public cloud turns 8 today

 

3 HOURS AGO

No Comments

birthday cake
photo: wfabry
SUMMARY:

When Amazon launched S3 in March, 2006, no one with the possible exception of Jeff Bezos et al, thought that Amazon Web Services would become an IT juggernaut. Well, guess what?

Eight years ago Amazon, the online book seller, announced a storage service for the internet. That S3 service was the first of a slew of cloud-based products that Amazon launched and which, it can be safely said, shook the IT world to its roots.

I can remember this time well as I left Microsoft in Apr 2006 and a friend left Microsoft to join the AWS team, and he is still there.

Web Services Evangelism

Amazon Web Services

May 2006 – February 2010 (3 years 10 months)

Amazon Web Services provides developers with direct access to Amazon's robust technology platform on a true on-demand basis. For example, what used to require an upfront investment in servers is now an on-demand utility, accessible for only $0.085/hour, with no upfront costs, no monthly minimums, and no catches.

There is nothing more exciting than telling the world about the amazing things that they can do with Amazon Web Services. So it was easy to travel the world, telling anyone who would listen, about this new thing known as "the cloud". Isn't it incredible? In 2006 S3 and EC2 were born -- and we were amazed that 185 million "objects" were stored in Amazon S3.

Facebook Open Sources its PUE/WUE Dashboards

Facebook has open sourced its PUE and WUE Dashboard tools.  We’ll see if others contribute open source data center SW.  Right now Netflix and Facebook seem the most active.

Lyrica McTiernanEngineering

Open sourcing PUE/WUE dashboards

POSTED 6 HOURS AGO

Last April, we launched public dashboards that visualize real-time energy and water efficiency in ourOregon and North Carolina data centers. We’re proud of our data center efficiency, and we wanted to demystify data centers and share more about what our operations really look like.

In the spirit of transparency, we encourage others to share these data as well. To make this easier, today we’re open sourcing the code for these dashboards so anyone can use it. Since not all operational systems aggregate data in the same way, we’ve separated the code into two pieces: a front-end UI component and a back-end data aggregator that may be helpful for those with systems similar to ours. The two components work together – or they can be used separately.

Rackspace participated in beta testing this code and provided feedback prior to this open source release. Rackspace is currently considering integrating the open source code into their facilities.

In the coming months, we will be adding a dashboard for our Sweden data center and – when it’s operational – for our Iowa data center as well.

To access the dashboard code, go to GitHub for the front-end component, and the back-end data aggregator.

Upgraded from Residential to Business Internet Access on Comcast, 5 days so much better after upgrade

I thought my Internet connection was OK.  I had decent speeds 50mbps down 10 up.  What I found irritating though was not as good a connections through VOIP, Video Conferencing, Microcell.  Netflix was OK.

I have a semi-retired Microsoft friend who works from home like I do and he swears by his Business Comcast Internet connection.  Only recently I had the option of adding Business Comcast Internet at home.  I finally decided to give it a try and see if it works, worse case no difference I switch.

Getting switched was less than 1/2 hour.  I decided to get a new Cable Modem while I was at it.  I already had a DOCSIS 3.0 modem, but decided given it was 4 years old I’d rather upgrade now, and have one less thing to worry about in the upgrade.

Speedtest is the same 50 down and 10 up and the cost is about $40 more a month.  Is it worth it?

1) Skype Video calls were improved to HD quality.  Also started using Fuze Box and the HD quality was consistent.

2) VOIP and Microcell call quality appears more consistent.

3) Netflix now on playing in HD quality on all devices - AppleTV, Samsung TV, iPad, and Mac.

4) Streaming from iTunes starts faster, including on AppleTV.  Before I would wait 5-10 seconds for a trailer to start.  Now the wait is 1-2 seconds.

5) Youtube video starts sooner.

So is it worth another $40 a month.  Given I use the Internet every day, work from home, and my wife does as well.  I would say I probably get $40 almost every day due to not waiting and higher quality streaming services, but of course I wouldn’t pay that kind of money.

Another intangible that I don’t use, but nice to have is Business Comcast Internet is staffed 24 hours a day, 7 days a week.   So when I have problems with Internet access I don’t wait like my neighbors.  I get a truck rolling to my house within 24 hours if there is a problem requiring a technician.

Here is a Arstechnica article on a writer who switched to Business Comcast Internet.

Why I pay extra for “business-class” broadband at home

No data caps, no blocked ports, and better support are pretty darn compelling.

5 details about Google's Cloud

Here is a post on GigaOm on 5 things your probably didn’t know about Google’s cloud.  Here is one.

1. Google Compute Engine Zones are probably in Ireland and Oklahoma

In 2012 Google released impressive internal photos of their data center facilities and mapped them out. However, the Compute Engine Zones are very non-specific, e.g. “europe-west1-a”. Indeed, they have only two geographical regions (Europe West and US Central) compared to Amazon’s nine. In addition to its 13 locations, SoftLayer has announced 15 new data centers just for this year.

Google’s networking is very opaque. If you traceroute an Amazon or SoftLayer instance, you can see where traffic is going, the network providers, and usually the locations of the routers. In contrast, Google goes into its network at the closest POP, and everything else is very hidden.

It’s possible to guess where Google is locating its cloud. A test of a Google Compute Engine instance showed round trip responses within 20ms from London, UK. If we compare that to pings from London to the three European countries where Google has facilities — Ireland, Belgium and Finland —  we can rule out Belgium and Finland because the ping round trip time is too high. Only the Ireland facility is close enough.

Google europe-west1-aAmazon eu-west (Ireland)Belgium
0.be.pool.ntp.org
Finland
0.fi.pool.ntp.org

20ms

22ms

38ms

49ms

Disclosure: I work part-time for GigaOm Research.