Big Data Future and Past Stats


Screen Shot 2015-01-30 at 8.15.24 AM

Source – floydworx

In honor of Apple’s achievements noted earlier this week, here’s more on just how much more data capability we have now, looking back on where we were.  (Or The Way We Were if you really want to go back in time.)

How did we make it to the moon and back without digital capability?

Screen Shot 2015-01-30 at 8.15.15 AM

Source: floydworx


In the lens of Big Data, where Data Lakes are less expensive than data warehousing, here are some great data points emphasizing how far prices have dropped.

Screen Shot 2015-01-29 at 1.41.39 PMThese AMAZING info graphics shots are taken from floydworx on  Check out the entire fantastic work.


Apple Tops Smartphone Shipments In China For The First Time, Says Canalys

Colette Grail:

If Levi’s and Coca-cola helped bring down the Berlin Wall and break up the USSR, what happens when Apple takes root in China?

Originally posted on TechCrunch:

Apple has topped smartphone shipment numbers in China in Q4 2014, according to analyst firm Canalys. The popularity of the iPhone 6 and 6 Plus helped push the California gadget-maker ahead of its homegrown competitors, seeing it ship more than Xiaomi and Huawei, and putting it out in front of Samsung, which placed third overall. Canalys credits the large screen of the newest smartphone models, along with proper support for China’s LTE networks, good launch timing, and a crackdown on gray market exports from the nearby Hong Kong market for the iPhone’s significant win in the key Chinese market.

This makes the first time Apple has led the rankings of smartphones as measured by devices shipped according to Canalys. It’s a tremendous feat considering that most of its handsets come in at almost twice the retail cost of flagship hardware from competitors like Xiaomi, which has itself proved a formidable…

View original 134 more words

Data Lake Return on Investment



For all it’s capability, it might be hard to believe that Data Lake’s actually cost less than traditional data warehousing – significantly less.  Quotes range from 10 to 100 times less or 75% less.

Processing and storage costs using traditional data warehouse and RDBMS technologies are typically in the 10’s of thousands of dollars per terabyte. By comparison, costs on Hadoop are in the 100’s of dollars per terabyte. That’s a 100x cost reduction.  Source:  Sourcethought

Big Data is a tremendous capability differentiator AND it reduces cost.  That’s a Big (Data) Return on Investment.  So:

Creating a Data Lake is often the first and best step for an organization to begin utilizing Big Data as a strategy.

With last week’s discussion on Data Lakes, it should be noted how valuable the capability is.  Here is one example of how a data lake is utilized for healthcare.

UC Irvine Medical Center maintains millions of records for more than a million patients, including radiology images and other semi-structured reports, unstructured physicians’ notes, plus volumes of spreadsheet data. To solve the challenge the hospital faced with data storage, integration, and accessibility, the hospital created a data lake based on a Hadoop architecture, which enables distributed big data processing by using broadly accepted open software standards and massively parallel commodity hardware.

Hadoop allows the hospital’s disparate records to be stored in their native formats for later parsing, rather than forcing all-or-nothing integration up front as in a data warehousing scenario. Preserving the native format also helps maintain data provenance and fidelity, so different analyses can be performed using different contexts. The data lake has made possible several data analysis projects, including the ability to predict the likelihood of readmissions and take preventive measures to reduce the number of readmissions.  Source:  price waterhouse cooper  – 

Price Waterhouse Cooper also has a nice graphic on how an organization utilizes the Data Lake.

pwc data lake graphic

What is a Data Lake and What Does it Do?


, ,

Your organization already has a lot of information:

  • existing database systems (and it’s older versions)

  • website information and its clickstream

  • product or service production and/or performance measurements

  • social media that’s become a intricate portrait of interaction because one site doesn’t do you justice … or ensure your existence.

All these information venues are vital to your organization, but they don’t necessarily talk to each other. Each has different formats, and different purposes, and were created at different moments on the graph of Moore’s Law so how can all that data be used (or not wasted)?

In Big Data, this is the Data Lake.

The Data Lake holds a lot of awkward fish

The Data Lake holds a lot of awkward fish

Since Data Lake is a common term in Big Data speak, exactly what is it? What can a data lake do? What does the data lake NOT do?

“A data lake is a large object-based storage repository that holds data in its native format until it is needed. “ – Margaret Rouse at

So with Big Data, the Data Lake holds the data in its original format, until time to use it.

How does the Data Lake work?

“While a hierarchical data warehouse stores data in files or folders, a data lake uses a flat architecture to store data. Each data element in a lake is assigned a unique identifier and tagged with a set of extended metadata tags. When a business question arises, the data lake can be queried for relevant data, and that smaller set of data can then be analyzed to help answer the question.” – Margaret Rouse at

Data lake was first associated with Hadoop and its object oriented storage; however, the term has become colloquial in business analytics and data mining.

So. .. a data lake is where all or a significant portion of an organization’s data can be stored by everyone and used by anyone – kinda like Woodstock meets the UN. All the information silos of the past are dissolved into a common mass with single point access.  This provides a holistic picture of an organization’s operations, as well as traceable accountability.  The Data Lake is an opportunity to keep all the organization’s information as an archive to manipulate all the pieces.

Previously this wasn’t possible because of the disparate structured databases that couldn’t “talk to each other.” Ironically, although placed into the data lake, they still don’t have to talk to each other. (UN right?)

UN dudes around table

That scenario is the ideal and reality is a bit more challenging.

Next post: data lake best practices.   Like inviting UN delegates from around the world in an empty room without any program or format or translators isn’t going to be very productive, even with the best heavy hors-d’oeuvre selection and a generous open bar.  The Data Lake is going to need some rules and best practices.

A Trip Down Memory Lane – Moore’s Law



In 1965, George Moore, director of research and development (R&D) at Fairchild Semiconductor at the time, forecast in an interview that “The complexity for minimum component costs has increased at a rate of roughly a factor of two per year.”  He later modified that rate to doubly every two years at the 1975 IEEE convention.  Thus Moore’s Law was created.

The semiconductor industry actually has used this law to set goals for capability, and the results are impressive.  It works.

By Wgsimon (Own work) [CC BY-SA 3.0 ( or GFDL (], via Wikimedia Commons

By Wgsimon (Own work) [CC BY-SA 3.0 ( or GFDL (, via Wikimedia Commons

Whether Moore set the bar or predicted the future.  Thank goodness we don’t have to lug around the “executive portable computer” any more.  The iPhone pictured below is from 2007 too.

An Osborne Executive portable computer, from 1982 with a Zilog Z80 4MHz CPU, and a 2007 Apple iPhone with a 412MHz ARM11 CPU; the Executive weighs 100 times as much, has nearly 500 times as much volume, cost approximately 10 times as much (adjusted for inflation), and has about 1/100th the clock frequency of the smartphone.

An Osborne Executive portable computer, from 1982 with a Zilog Z80 4MHz CPU, and a 2007 Apple iPhone with a 412MHz ARM11 CPU; the Executive weighs 100 times as much, has nearly 500 times as much volume, cost approximately 10 times as much (adjusted for inflation), and has about 1/100th the clock frequency of the smartphone.

Glossary of Big Data Things (GoBDT)


, ,

It’s not the IoT but I did find a most excellent glossary of Big Data terms on the Data Informed website. The glossary is an excellent resource and the website also provides a wide array of information about what is Big Data and what is up next.  I like it so much I’m adding it to the pages for reference.

Next week will be discussion of the Data Lake (which oddly did not make Data Informed GoBDT.)

CNN includes Big Data as What to look forward to in 2015


, , ,

Amidst everything from the living on Mars to all the global concerns in the Middle East, China, & Russia, CNN included these paragraphs as significant impact to the coming year.

“The Internet of Things

It’s a prosaic description of a complex and powerful phenomenon that is about to land on us — a massive surge in the data showing what we do, when and where. None of this happens in a calendar year of course, but by 2020 according to IT industry analysts Gartner there will be 26 billion connected devices generating and consuming data, assisted by the explosion in wearable technology such as smart glasses and watches.

So businesses will invest heavily in 2015 (and beyond) in data storage, analysis and retrieval. Knowing how to classify the data, what to keep and what to discard, what to hold locally and what to store in the cloud, will all be critical to smart use of this insane amount of information.

Susan Hauser of Microsoft expects businesses to “make use of big data services in the cloud and we expect machine learning to grow exponentially across the retail, manufacturing and health care sectors.”

“Deep learning” will be a big part of this process — whereby computers are designed to behave more like the human brain thanks to “deep neural networks” enabling them to learn through observation. We are getting close to the point where machines will make better decisions than us.

As Anthony Wing Kosner writes in Forbes: “This will not be the year that 80% of the developed world loses their jobs to intelligent machines, but it is not too soon to start figuring out what to do about that eventuality.”

Maybe one of them will be writing this column next year.”

Meeting The Challenges In Mobile Health Innovation

Colette Grail:

Continuing the discussion of health information and cell phone usage …
wearables, ingestibles, and how they connect.

Originally posted on TechCrunch:

Editor’s note: Sumit Mehra is CTO of Y Media Labs.

Preventing disease is the Holy Grail of modern medicine. Many diseases plaguing society today are chronic and brought on by lifestyle choices; others have their roots in genetic or environmental factors. Either way, the ability of the healthcare community to prevent disease is heavily influenced by information. Gather the right data with enough warning time to impact the outcome, and most diseases can be minimized — or even eliminated.

Mobile health technology has the potential to fill this void. Applications, devices and technologies behind the “quantified-self” movement are exploding in number, precisely because of their power to collect, interpret and communicate the personal health data professionals so desperately need.

The future becomes even brighter as the walls of big data come down. Medical data has languished in silos for a long time, but that’s no longer the case…

View original 839 more words


Get every new post delivered to your Inbox.

Join 106 other followers