Oops, you mean Big Data does not perform magic

I don’t know about you, but I have learned to be cautious about grand claims of what can be done with big data.  Expecting a bunch of big data science people to be like magicians turning data into gold sounds good, but notice that those who tell the big data stories many times have something to sell.

Here is an arstechnica story on the Big Data Hubris.

Put another way, it's not uncommon to hear the argument that "computer algorithms have reached the point where we can now do X." Which is fine in and of itself, except, as the authors put it, it's often accompanied by an implicit assumption: "therefore, we no longer have to do Y." And Y, in these cases, was the scientific grunt work involved with showing a given correlation is relevant, general, driven by a mechanism we can define, and so forth.

And the reality is that the grunt work is so hard that a lot of it is never going to get done. It's relatively easy to use a computer to pick out thousands of potentially significant differences between the human and mouse genomes. Testing the actual relevance of any one of those could occupy a grad student for a couple of years and cost tens of thousands of dollars. Because of this dynamic, a lot of the insights generated using big data will remain stuck in the realm of uncertainty indefinitely.

Recognizing this is probably the surest antidote to the problem of big data hubris. And it might help us think more clearly about the sorts of big data work that are most likely to make a lasting scientific impact.