What not Why of Big Data

What not Why of Big Data
November 28, 2014 1 Comment About pages Colette Grail

Bringing Big Data to the People (Part 5 of 6)

What Not Why – Not Your Mother’s Scientific Method

What Not Why is a mental shift that accompanies the 3 Vs of Big Data. Big Data consumes great volumes of a variety of data and produces ‘what” the data is. Big Data tells you what is happening with the data, but not why. The “answer” Big Data gives is not “why” but “what”?

Walmart, Hurricanes & Pop Tarts

For example, Walmart has been a leader in data accumulation pre-dating true Big Data emergence.  Product placement is critical for profit margins. When Walmart began using that data, one correlation they found was that prior to a hurricane, not only did people stock up on batteries – but also Pop Tarts.

Unlike this Big Data example, in traditional Scientific Method, a hypothesis would be created, such as when a hurricane is coming, people buy “________”.   A specific representative data sample would be calculated. The test would be run with a product and then repeated until a positive result (accept the hypothesis) indicated what was bought prior to a hurricane.

This iterative process is Trial and Error.  Whereas a data analyst finds answers to questions, data scientists manipulate the data to see what it tells them.  Scientific method and hypothesis testing of data sets has required math – probability and statistics.

This iterative process is Trial and Error. Whereas a data analyst finds answers to questions, data scientists manipulate the data to see what it tells them.

Big data does not need a sample set of the correct data to prove or disprove an idea. As in the Walmart example, study of the entire data itself provides a result without a pre-conceived notion of what the “answer” should be. Big Data scientists look for what the data tells them, not whether or not their hypothesis holds up.

Does Walmart know “why” people buy Pop Tarts before a hurricane?  Maybe or maybe not, but they do make sure to stock them near the front.

Scientific method and hypothesis testing of data sets has required math – so can we forget about probability and statistics now?

Tags
About The Author
%d bloggers like this: