Bringing Big Data to the People (Part 6 of 6)
So can we forget probability and statistics?
No, but Big Data usage does involve switching train of thought on several items.
No Sample Set
The Scientific Method taught in school involves selecting what to test and framing a question and a hypothesis to test. Using probability and statistics, the sample size, large or small, has been used to represent the data as a whole. Big Data doesn’t need that. It processes everything – not sample sets.
In the scientific method, an accurate sample set is needed to effect more dependable results (hence confidence intervals.) Probability and statistics is based upon using samples because this has been the capability of processing the data.
That processing capability has become faster and cheaper. Along those same lines, today start-ups, small businesses, and other venues can utilize BD, not just Google and Walmart.
The conventional hypothesis method also introduces bias via the experimenter’s questions. When restricted to choosing the correct question to ask, the experimenter loses all the possible solution sets available from the data. How many times have you been frustrated by not asking the correct question when looking for an answer?
Big Data and visualization don’t just provide a cool new way to look at data, it presents data in a way that would not have been seen with traditional scientific method technique. This video is a somewhat brief explanation of how Big Data can remove personal bias through visualization – and that’s not just pretty pictures.