Bringing Big Data to the People (Part 4 of 6)
Beyond Natural Selection – Variety
Data used to have to be carefully selected for processing both in quantity and quality. Data was strictly formatted. At first, its gatekeepers were men in lab coats and pocket protectors (and eventually morphed to the IT guys.)
As data became more prolific, it became more personal through spreadsheets and databases that were possible on home computers via Lotus and Microsoft. Anyone with a PC and cheap software could learn basic capabilities with a little effort. With a lot of effort, any PC could actually accomplish quite a bit with these tools (most users only utilize less than 10% of any MS product ability.) Anyone who’s worked with a pivot table or even just got the “!” trying to use a spreadsheet understands the need to have the right format to manipulate the data.
Big Data is a lot more than a Big MS tool. BD consumes all data – heterogeneously – words, images, audio, telemetric, transactional, scanned analog, legacy databases and social media. The data must still be scrubbed but BD ingests everything – an information jabberwocky of sorts.
This scrubbing process changes source data to application data, which can then be manipulated. The increase in variety and subsequent scrubbing process has given rise to the Fourth V – veracity – the uncertainty of data.