Top 10 Data Science Movies – What’s YOUR Fav?


, ,

With winter in the works, it’s time to revisit some movies

and what’s better than sexy Big Data meets hunky analytics?  Vote your conscience but some hints are listed below in an awesome info graphic from Analytics Vidhya.

Top Ten Data Movies

Elephant(s) In the Room – Big Data Elephant Conservation Info Graphic


, , , , , ,

Big Data can be utilized for solving Big Problems such as animal conservation.

The United States has a history of destroying habitats and wildlife populations – but also restoring some with herculean effort.
In the 1930s Great Depression, the Dust Bowl was created by over consumption of a continent of rich resources.  The lost bountiful wheat plains of the US not only spawned desperate economic times but also spurred deadly dust storms that proved sickly and lethal to crop dwellers.  It would take federal intervention to stop the destruction.  With the Dust Bowl effects, wild fowl became almost instinct.  It would take a dedicated cadre of volunteers – an unlikely hero of hunters – to save thousands of species.  Ducks Unlimited was and is today an international, volunteer effort.
Today the US still toes a strong line by allowing agriculture and construction to balance with protecting animal and cultural heritage locally, while icons such as the bison and Ted Turner contend in much more public controversies.  Conservation is a delicate entente at best but its effects can change individuals, communities and cultures.

So … why do we care how many elephants are in Africa?

Elephants are killed by poachers for their tusks, as are rhinos for their horns.  Illegally procured, these black market gems are worth ten to a hundred times their amount once out of their native country and they finance more criminal activity as well as terrorism.  Their destruction is causing a traumatic socio-economic implosion that is being felt worldwide.
Saving Elephants in Africa is one example of environmental challenges felt locally reaching globally. The struggle to save elephants and rhinos and lions and ecosystems and the cultures that depend on them are poignant to every region or country.  Counting elephants is a perfect opportunity to utilize Big Data to solve Big Problems.
Big Data Elephant Conservation

List of Open Data Portals from Around the World by



There’s BIG and there’s OPEN.

Big Data is volume, velocity and variety of data captured in ways only beginning with capabilities only just being explored.

Open data is a little more mature of a market, going back to the origins of code.  The collaboration that created personal computing and the internet laid a foundation of sharing information unabashedly.  It’s a noble ambition as well as a practice requiring considerable fortitude.

In our world of increasing data capabilities, openness is a necessary topic of conversation.  Here is one forum if you’re in the neighborhood.  From the Brussels Data Science Community page:

The Brussels Data Science Community


We will have a special meetup about Using Open data to promote Data Innovation on Thursday, November 26, 2015 @VUB. Here is a Comprehensive List of Open Data Portals from Around the World. is the most comprehensive list of open data portals in the world. It is curated by a group of leading open data experts from around the world – including representatives from local, regional and national governments, international organisations such as the World Bank, and numerous NGOs.


Source: List of Open Data Portals from Around the World by

The Open Definition sets out principles that define “openness” in relation to data and content.

It makes precise the meaning of “open” in the terms “open data” and “open content” and thereby ensures quality and encourages compatibility between different pools of open material.

It can be summed up in the statement that:

“Open means anyone can freely access, use, modify, and share for any purpose(subject, at most, to requirements that preserve provenance and openness).”

Put most succinctly:

“Open data and content can be freely used, modified, and shared by anyone forany purpose

Remembrance Day



For all those who served in all countries,

For all those that supported those who went,

For all those that were lost in battle – combatants and non-combatants,


Peace ever first on our minds.

And Big Data used for peace, a rising tide that lifts all boats.


For the US vets and all vets around the world.

The Elephant(s) In the Room – Just How Many Are There?


, , , , , , ,

Counting Elephants – Using Big Data to Solve Big Problems, it’s as easy as 1-2-3

One … two … three … four hundred … five hundred thousand …. How many elephants can you count before it’s too many (or too much)?

Counting is one of the first skills learned as a child.  Before addition and subtraction, the numerical building blocks of 1-2-3 are right there with the ABCs.  Whether it’s your blessings or the dealer’s cards or your money, counting comes in handy.  How many or how much is core to decision making.  How much money or how many resources do I have?   How many does the enemy have?

Counting to 10 or maybe 100 is easy, but as more needs to be counted, it becomes tedious and time intensive.  The practice loses its return on energy.  That’s where math and probability and statistics come in.  Using what can be counted easily can be leveraged not only to count more but also to add value to the meaning of what is counted.

Enter the classic “what to wear” problem (poignant for math geeks).

Instead of laying out each combination and “counting” it, you know how many outfits you have.  This simple example can be exploited in far greater combinations…to the nth degree.

But what if you don’t know how many shirts and pants you have?

Big Data Counting – The Next Generation

Counting is another excellent venue for exploring Big Data.  Just as math saves hours of manipulating closet items, so Big Data can help with Big Problems, providing greater choices and better decision capability as a result.

Let’s look at a Big counting problem –

just how many elephants are there in Africa?  (and why does anyone care?)

A shocking increase in poaching has ripped down African elephant populations.  In the past 10 years, the African elephant population has taken a dramatic hit with estimates that 12,000 plus a year have been slaughtered since 2006.

Dr. Mike Chase, director and founder, Elephants Without Borders. “The threat of local extinction feels very real. In October 2013, Elephants Without Borders flew a survey over a park where we had previously counted more than 2,000 elephants. We counted just 33 live elephants and 55 elephant carcasses. That is why this research is so important.

Wildlife preservation is a delicate entente in the best of circumstances, but the lucrative draw of poaching in the myriad of African countries where they habitate has challenged several of its iconic epicenters.  Lions, rhinos and elephants are the majestic leaders of a rich wildlife pyramid whose dramatic loss crushes the whole ecologic system, including the native peoples that live and exist off the balance.

Poaching itself is lucrative.  The transit from Africa to Asia transforms ivory at $200 to over $2000.  Because of international standards outlawing this black market material, poaching profits only illicit activity and most dangerously – terrorism.

Elephants Without Borders

Except in South Africa

South Africa “suffers” from too many elephants.  Here the growing numbers continue to roam and forage as is their nature.  That means knocking over even the most sturdy of trees and stripping them of the best digestible leaves.  Just imagine an elephant walking through your yard or the neighborhood park taking down a couple trees that look tasty.  Imagine what a herd of 20-30 can do.  They just don’t stand still either.  They keep on the move, journeying for miles in a day carving a pachyderm hurricane path.

In any amount, this is nature’s process, culling the forest for new vegetation.  Their trails create natural fire breaks and they dig for water which other animals use.  But  where farm and urban sprawl encroach this roaming territory, it quickly becomes man versus animal.  The number of touch points are growing too.  The nature of elephants – their survival – is roaming.  Their legendary memory too has them cross paths where man’s development has erased the past.

To attempt that delicate balance, game parks in South Africa have taken to birth control and water management methods in order to keep their numbers in check.



So What Numbers Are We Talking?

Anyone who has tried to count children at a birthday party or getting all students back into a classroom after recess knows the challenges of counting live bodies.  Counting crowds is actually a science.  And Science isn’t about Knowing so much as Getting a Good Estimate.  Here’s how they counted President Obama’s inauguration crowd.

Although a several ton elephant is noticeably slower and harder to miss, expand the search over the wildernesses of half a continent dissected into several countries, some war torn, and accurate counting is hard to imagine.  But someone is trying.

Not Just Throwing Money at the Issue

That effort is the Great Elephant Census, the largest pan Africa aerial survey since the 1970s, and it’s backed by one of the world’s smart guy-in-the-room icons – Microsoft co-founder Paul Allen.  Not only does this count have deep pockets, it also has expert guidance.


The Great Elephant Census is applying a strategic, consistent approach to counting elephants in numerous countries in varying climate and terrain, with an integrated audit program in situ.

The Great Elephant Census is designed to provide accurate and up-to-date data about the number and distribution of African elephants by using standardized aerial surveys of tens of hundreds of thousands of square miles. Dozens of researchers flying in small planes will capture comprehensive observational data of elephants and elephant carcasses. Our standardized method of data collection, which is validated by an independent TAT advisor ensures all data is impartial and accurate.

It’s somewhat like counting the crowds for President Obama’s inauguration.  Even with such meticulous effort though, most elephant accounting is predicated on “known” and “estimated” numbers.

Elephant Database

But … Back to Big Data

So that’s how the experts are counting elephants.  Let’s explore counting elephants instead with a Big Data lens.  An elephant census doesn’t have to solely be tallying head counts, albeit a magnificent head with flowing ears and strong tusks.  The count can be created through a variety of volumes of data that exists already and grows by the minute.  In a Big Data Elephant Census, information is created by the community and also serves the communities in return.

Big Data Elephant Census begins with a data lake of information collected from the prevalent sources:  cell phone usage, transactional data, weather, heat signature, game warden activity/reports, international shipping and markets, and of course, social media.  Big Data ingests the volume, velocity and variety of the data to look for patterns that emerge.  Like trying to count moving children, Big Data can exploit information that is too complex for “naked” human observation.  Like picking outfits from the mathematically derived wardrobe, Big Data Elephant Census provides an answer to how many elephants as it elicits the holistic picture of what that means.


The point of counting elephants is not just to know how many elephants there are; we want to know all the factors that evolve in the elephant environment.  How does the diverse animal and vegetation habitat ebb and flow with the tramping of elephant feet?  How are indigenous and foreign humans influencing and being influenced by the elephant footprint? (har har) How are poaching and anti-poaching efforts impacting the community as well as the elephants?  How are farming and native livelihoods affecting and being affected?  What other passive economic factors, weather, and politics shift accordingly?


So let’s stop trying to capture a picture and instead a flow.  Pulling Big Data elephant count from a volume, velocity and variety of data sources articulates how the system (man and animal) manifests.  Instead of trying to chase the right amount of elephants, Big Data Elephant Counts observe the evolving energy to find the signals in the noise.  Gentle shifts or environmental shocks are recorded in situ with all the elements and players.  Makes it both predictive as well as evolving and less reactive.

How Twitter Feels The Earth Move


, , , ,

In the 1830s, Samuel Morse developed his self-named code to transmit messages via dots and dashes.

Strung together, the bits and pieces communicated words and conveyed information far more quickly over far greater distances than ever before.  Twitter uses 140 characters or less per tweet and yet the volume and velocity don’t just tell individual stories; they aggregate to provide viable, robust information.

Just as Google could predict the next flu outbreak weeks before the Center for Disease Control (CDC), Twitter makes the call on earthquakes.   With the correct algorithms, tweets provide data points that both detect and verify quake information provided by mechanical sensors or instead of the same where locations are too remote for sensors.

How the USGS uses Twitter data to track earthquakes

#DataStories is where we interview people doing interesting work with Twitter data. This week we’re speaking with Paul Earle and Michelle Guy of the USGS on how they use Twitter data to monitor earthquakes.

After the disastrous Sichuan earthquake in 2008, people turned to Twitter to share firsthand information about the earthquake. What amazed many was the impression that Twitter was faster at reporting the earthquake than the U.S. Geological Survey (USGS), the official government organization in charge of tracking such events.

This Twitter activity wasn’t a big surprise to the USGS. The USGS National Earthquake Information Center (NEIC) processes about 2,000 realtime earthquake sensors, with the majority based in the United States. That leaves a lot of empty space in the world with no sensors. On the other hand, there are hundreds of millions of people using Twitter who can report earthquakes. At first, the USGS staff was a bit skeptical that Twitter could be used as a detection system for earthquakes – but when they looked into it, they were surprised at the effectiveness of Twitter data for detection.

USGS staffers Paul Earle, a seismologist, and Michelle Guy, a software developer, teamed up to look at how Twitter data could be used for earthquake detection and verification. By using Twitter’s Public API, they decided to use the same time series event detection method they use when detecting earthquakes. This gave them a baseline for earthquake-related chatter, but they decided to dig in even further. They found that people Tweeting about actual earthquakes kept their Tweets really short, even just to ask, “earthquake?” Concluding that people who are experiencing earthquakes aren’t very chatty, they started filtering out Tweets with more than seven words. They also recognized that people sharing links or the size of the earthquake were significantly less likely to be offering firsthand reports, so they filtered out any Tweets sharing a link or a number. Ultimately, this filtered stream proved to be very significant at determining when earthquakes occurred globally.

USGS Modeling Twitter Data to Detect Earthquakes

While I was at the USGS office in Golden, Colo. interviewing Michelle and Paul, three earthquakes happened in a relatively short time. Using Twitter data, their system was able to pick up on an aftershock in Chile within one minute and 20 seconds – and it only took 14 Tweets from the filtered stream to trigger an email alert. The other two earthquakes, off Easter Island and Indonesia, weren’t picked up because they were not widely felt.

USGS map of earthquakes

On any given day, the NEIC processes about 70 earthquakes, but only a small handful of these might be felt. They might take place in the ocean, deep in the earth, or away from populated areas. Twitter data can be crucial in helping identify earthquakes felt by humans, and can trigger an alert typically in under two minutes. The 2014 earthquake in Napa was detected by USGS in 29 seconds using Twitter data, likely due to the tech savvy population that dominates the area. (Origin time was 2014-08-24 10:20:44 UTC and Twitter data detection time was 2014/08/24 10:21:13.)

The USGS monitors for earthquakes in many languages, and the words used can be a clue as to the magnitude and location of the earthquake. Chile has two words for earthquakes: terremotoand temblor; terremoto is used to indicate a bigger quake. This one in Chile started with people asking if it was a terremoto, but others realizing that it was a temblor.

As the USGS team notes, Twitter data augments their own detection work on felt earthquakes. If they’re getting reports of an earthquake in a populated area but no Tweets from there, that’s a good indicator to them that it’s a false alarm. It’s also very cost effective for the USGS, because they use Twitter’s Public API and open-source software such as Kibana and ElasticSearch to help determine when earthquakes occur.

Next, the USGS team says that they want to determine if they can drop Twitter data based detections into seismic algorithms, and if that can speed up alerts even more.

Thanks to Paul Earle and Michelle Guy of the USGS for taking the time to speak with us.

Using Computer Vision to Analyze Aerial Big Data from UAVs During Disasters

Colette Grail:

Big Data expert Patrick Meier provides an excellent explanation of how Big Data can be leveraged for disaster relief. He discusses just enough technical detail so you can appreciate the challenges and yet still understand the premise.

Originally posted on iRevolutions:

Recent scientific research has shown that aerial imagery captured during a single 20-minute UAV flight can take more than half-a-day to analyze. We flew several dozen flights during the World Bank’s humanitarian UAV mission in response to Cyclone Pam earlier this year. The imagery we captured would’ve taken a single expert analyst a minimum 20 full-time workdays to make sense of. In other words, aerial imagery is already a Big Data problem. So my team and I are using human computing (crowdsourcing), machine computing (artificial intelligence) and computer vision to make sense of this new Big Data source.

For example, we recently teamed up with the University of Southampton and EPFL to analyze aerial imagery of the devastation caused by Cyclone Pam in Vanuatu. The purpose of this research is to generate timely answers. Aid groups want more than high-resolution aerial images of disaster-affected areas, they want answers; answers like the number and location of damaged buildings, the number

View original 1,105 more words

Why Doesn’t He Call? Trends in mobile devices


, , ,

Seven Years Into The Mobile Revolution: Content is King… Again

By Simon Khalaf, SVP Publishing Products
Last year, on the eve of the sixth anniversary of the mobile revolution, Flurry issued our annual report on the mobile industry. In that report, we analyzed time spent on a mobile device by the average American consumer. We ran the same analysis in Q2 of this year and found interesting trends we are sharing in this report.After putting the desktop web in the rear view mirror in Q2 2011, and eclipsing television in Q4 2014, mobile and its apps have cemented their position as the top media channel and grabbed more time spent from the average American consumer. In Q2 of 2015, American consumers spent, on average, 3 hrs and 40 minutes per day on their mobile devices. That is a 35% increase in time spent from one year ago and a 24% increase from Q4 2014. In just six short months, the average time American consumers spend on their phones each day increased by 43 minutes.

To put things in perspective, there are 175 million Americans with at least one mobile device. This means that, in aggregate, since November 2014, the US connected population is spending an extra 125 million hours per day on mobile devices. This growth rate is especially astonishing after seven consecutive growth years.

The Browser: Sidelined

Looking at the chart above, today only 10% of the time spent on mobile is spent in the browser, down from 14% a year ago. The rest of the time, 90%, is spent in apps. Effectively, the browser has been sidelined on mobile. This has major implications on the digital industry in general and the content and media industry in particular. Historically, the media industry has relied almost entirely on search for user and traffic acquisition, building entire teams around SEO and SEM on the desktop web. But search engines are predominantly accessed from a browser. If mobile users aren’t using browsers, the media industry will have to look for new approaches to content discovery and  traffic acquisition.

The Media Industry: Absorbed by Apps

The chart below takes a closer look at app categories. Social, Messaging and Entertainment apps (including YouTube), account for 51% of time spent on mobile.


Entertainment (including YouTube) grew from 8% of time spent last year, or 13 minutes per day, to 20% of time spent, or 44 minutes per day this year. This is 240% growth year-over-year, or an extra 31 minutes. That is more than the time it would take to watch an additional TV sitcom for every US consumer, every day!


Messaging and Social apps grew from 28% of time spent last year or 45 minutes per day to 31% of time spent or slightly more than 68 minutes per day this year. This is a 50% year-over-year increase. However, the majority of time spent inside messaging and social apps is actually spent consuming media, such as videos on Tumblr and Facebook or stories on Snapchat. A study by Millward Brown Digital showed that 70% of social app users are actually consuming media. While we can’t correlate the 70% directly to time spent, we firmly believe that media consumption, either articles read in the web view in app, or video consumed in the feeds, constitute the majority of time spent in social apps. This is a big trend and one that will be watched very carefully by traditional media companies. These companies have to adjust to a new world where consumers act as individual distribution channels. The growth in entertainment on mobile proves once again that content is in fact king and is beating the gaming industry in its own game.

The Gaming Industry: Time is Money

The completely unexpected result of our analysis this year is the dramatic decline in time spent for mobile gaming. Gaming saw its share decline from 32% last year (52 minutes per day) to 15% of time spent (33 minutes per day) this year. This is a 37% decline year-over-year. We believe there are three factors contributing to the decline.

  1. Lack of new hits: Gaming is a hit driven industry and there hasn’t been a major new hit the past 6 to nine months. The major titles like Supercell’s Clash of Clans, King’s Candy Crush, and Machine Zone’s Game of War continue to dominate the top grossing charts and haven’t made room for a major new entrant.
  2. Users become the game: Millennials are shifting from playing games to watching others play games, creating a new category of entertainment called eSports. This summer, Fortune named eSports, the new Saturday morning cartoons for millennials. In fact, some of the most watched content on Tumblr is Minecraft videos created and curated by the passionate Minecraft community.
  3. Pay instead of play: Gamers are buying their way into games versus grinding their way through them. Gamers are spending more money than time to effectively beat games or secure better standings rather than working  their way to the top. This explains the decline in time spent and the major rise in in-app purchases, as Apple saw a record $1.7B in AppStore sales in July.

What the mobile industry in general and the app industry in particular have achieved in the past seven years is amazing. Flurry now measures more than two billion devices each month and sees more than 10 billion sessions per day. That is 1.42 sessions for every human being on this planet, every day. And that is just Flurry! If there is anything to say about the mobile and app industry it is this: Mobile is on fire and it is showing no signs of stopping.

Thinking Small – IBM’s “Star qualities of successful mobile development projects”



Stating the Obvious

This info graphic comes from the IBM Center for Applied Insights.  It makes me wonder who got paid how much to develop this.  Just about anyone off the street could have guessed the numbers and categories and been relatively (and effectively) close.  As for Applied Insight, add just about any business case or personal situation, and the info graphic still applies.  Just a couple of suggestions …

Star qualities of successful kindergarten projects
Star qualities of successful relationships
Star qualities of successful one night stands

600 developers couldn’t be wrong

Actually 600 could be wrong.  “Learn what nearly 600 developers have to say about the secret to mobile development project success.”  Wow! IBM, really went out on a limb to collect data on this one.  I would think Big Blue could’ve scrounged up at least 1,000.  Actually, I imagine Google wouldn’t put out any results that had less than six zeros to pull from.


Source: [Infographic] Star qualities of successful mobile development projects

NOW for something completely different AND ACTUALLY USEFUL INFORMATION

Pulled from a search of “successful mobile application projects” images, this doesn’t tell the same story but it actually tells something worth reading!  10,000 app developers is a much better number.  Thank you Developer Economics!!



Get every new post delivered to your Inbox.

Join 148 other followers