Oops, you mean Big Data does not perform magic

I don’t know about you, but I have learned to be cautious about grand claims of what can be done with big data.  Expecting a bunch of big data science people to be like magicians turning data into gold sounds good, but notice that those who tell the big data stories many times have something to sell.

Here is an arstechnica story on the Big Data Hubris.

Put another way, it's not uncommon to hear the argument that "computer algorithms have reached the point where we can now do X." Which is fine in and of itself, except, as the authors put it, it's often accompanied by an implicit assumption: "therefore, we no longer have to do Y." And Y, in these cases, was the scientific grunt work involved with showing a given correlation is relevant, general, driven by a mechanism we can define, and so forth.

And the reality is that the grunt work is so hard that a lot of it is never going to get done. It's relatively easy to use a computer to pick out thousands of potentially significant differences between the human and mouse genomes. Testing the actual relevance of any one of those could occupy a grad student for a couple of years and cost tens of thousands of dollars. Because of this dynamic, a lot of the insights generated using big data will remain stuck in the realm of uncertainty indefinitely.

Recognizing this is probably the surest antidote to the problem of big data hubris. And it might help us think more clearly about the sorts of big data work that are most likely to make a lasting scientific impact.

Disney's Data Centers power changes in Prices and How it Services Guests

Disney has a large Data Center in North Carolina along with Facebook, Apple, and Google.  We can all understand what the latter companies do with data centers.  What does Disney do with a big data center?  One thing Disney does is crank out calculations on its guests and park operations.

Businessweek has a couple of articles on this topic.  One is its RFID tracking system.

The answer was on the electronic bands the couple wore on their wrists. That’s the magic of the MyMagic+, Walt Disney’s (DIS) $1 billion experiment in crowd control, data collection, and wearable technology that could change the way people play—and spend—at the Most Magical Place on Earth.


MyMagic+ promises far more radical change. It’s a sweeping reservation and ride planning system that allows for bookings months in advance on a website or smartphone app. Bracelets called MagicBands, which link electronically to an encrypted database of visitor information, serve as admission tickets, hotel keys, and credit or debit cards; a tap against a sensor pays for food or trinkets. The bands have radio frequency identification (RFID) chips—which critics derisively call spychips because of their ability to monitor people and things.

Another is Disney’s raising of ticket prices to $100 for a single day pass.

Walt Disney (DIS) is prying parental wallets open a little wider for that vacation visit to the theme park. The Empire of the Mouse is now charging $99 for a one-day park pass at its Magic Kingdom Park near Orlando, an increase of $4 that comes just eight months after the last price hike.

Behind the steadily rising ticket prices is the small world of supply and demand. People keep flooding Disney’s U.S. theme parks, notwithstanding steeper costs. The company reported a 16 percent increase in operating income, to $671 million, for the most recent quarter at its theme park division as sales rose 6 percent, to $3.6 billion. In Disney’s last fiscal year, theme park income rose 17 percent, to $2.2 billion. The company does not disclose attendance data.

A family enjoys the ease of using MagicBands to get on Jungle Cruise attraction.

What Will MyMagic+ Do for Passholders?

From FastPass+ service to the enhanced planning tools of My Disney Experience, MyMagic+ will make it easier than ever to plan, share and enjoy your next visit.

Problem analyzing data, know your source - example Consumer Reports Hospital Ranking based on Medicare Billing

We all want to go to the best hospitals.  I read the Consumer Reports ranking on Hospitals.  What I didn't know is the ranking is based on billing information from Medicare.  What does billing information from Medicare have to do with the quality of your hospital?  The ability for a hospital to format the data according to Medicare standards could be what gives a hospital a higher ranking.  Some hospitals weren't ranked because their data did not meet medicare standards.

Hope this gets you thinking about your big data projects.

NBCnews reports on this situation.

Dr. Peter Pronovost, senior vice president for patient safety and quality at Johns Hopkins and one of the leaders in the fight to improve hospital quality, applauds the idea but says the data the report is based on is flawed. “I really applaud the Consumer Report effort to get information to consumers about complications,” he said.

“The overall concept is spot-on,” Pronovost told NBC News. “One of the concerns is they measured these complications using administrative data, which is completely understandable, but we know it’s not completely accurate.”


Many of the biggest and most famous hospitals aren’t listed. Consumer Reports used Medicare reporting data for its report and could only include hospitals that reported data in a certain way.


The article points out an accuracy of 25% correlation of infections base on billing information filed.

Unfortunately, he said, there’s not much better data out there yet. One of the measures – infections among patients fitted with a catheter – is only right 25 percent of the time when calculated using billing information filed to Medicare, Pronovost says.

Three books on data visualization

The Economist has a review of three books on data visualization.

Data Points: Visualisation That Means Something. By Nathan Yau. Wiley; 300 pages; $32 and £26.99. Buy from Amazon.comAmazon.co.uk

Facts are Sacred. By Simon Rogers. Faber and Faber; 311 pages; £20. Buy fromAmazon.co.uk

The Infographic History of the World. By James Ball and Valentina D’Efilippo. Collins; 224 pages; £20. Buy fromAmazon.co.uk

Here is a video that shows you how the books look.

The author of this article hit upon exactly a point that came to my mind as well.  Should these books have even been in print.

But should these books have been published on paper at all? Today’s most impressive works, like “Wind Map”, were created to be online. Future infographics will be digital, data will stream in real-time and viewers’ interactions will determine what is presented. When this happens, what constitutes a good infographic will change. The revolution has just begun.

It's been a dream of companies like Adobe and others to allow the creation of online books.  But it is a challenge of the distribution channel not just creation

The one company that could take the above books online and allow them to make money would be Amazon.com.  Wouldn't it be cool if there was an AWS service that allowed you to create book-like content, make it interactive. posting video, etc.  Or Google could do this or maybe even Apple.

Oops, just because you access data doesn't mean it is OK to use it, Bloomberg reporters cross ethical boundaries

It may seem like common sense that if you set up camera to watch what someone surfs on the web it is illegal and unethical to report on those activities. But, when you are a media reporter who is driven to get more traffic, you think it is OK to do what others have been doing.  Crawling through user activity logs of Bloomberg services to ascertain what people are thinking about doing.

The news is spreading over the weekend.



Bloomberg bars reporters from client log-in data - USA Today

USA TODAY-2 hours agoShare
LOS ANGELES (AP) — Financial data and news company Bloomberg LP said Friday that it had corrected a "mistake" in its newsgathering ...

Even if you are not a reporter, you need to watch out for the same mistakes.  I have repeated said one of the dangers of big data environments is to put a bunch of silo'd data in one area, the problem is it can get you arrested as you may violating privacy laws by putting all the data in one environment where it is open to users to analyze.

Goldman Sachs is the one who is complaining.

A source at Goldman tells us that the firm was dumbfounded and outraged to discover what Bloomberg reporters were doing. The source says that, until recently, Bloomberg News reporters were able to see not just when individual Bloomberg subscribers logged in (and via what device), but what they did while they were logged in.

Specifically, the source says, Bloomberg News reporters were able to see: 

  • When individual subscribers logged in and logged out (and from where).
  • What type of information these individual subscribers looked at and how often they looked at it.

This is not a unique situation, and we'll probably hear more about as people  start to look for the signs of whether people are violating privacy laws.