With winter in the works, it’s time to revisit some movies
and what’s better than sexy Big Data meets hunky analytics? Vote your conscience but some hints are listed below in an awesome info graphic from Analytics Vidhya.
Big Data is volume, velocity and variety of data captured in ways only beginning with capabilities only just being explored.
Open data is a little more mature of a market, going back to the origins of code. The collaboration that created personal computing and the internet laid a foundation of sharing information unabashedly. It’s a noble ambition as well as a practice requiring considerable fortitude.
In our world of increasing data capabilities, openness is a necessary topic of conversation. Here is one forum if you’re in the neighborhood. From the Brussels Data Science Community page:
We will have a special meetup about Using Open data to promote Data Innovation on Thursday, November 26, 2015 @VUB. Here is a Comprehensive List of Open Data Portals from Around the World.
DataPortals.org is the most comprehensive list of open data portals in the world. It is curated by a group of leading open data experts from around the world – including representatives from local, regional and national governments, international organisations such as the World Bank, and numerous NGOs.
The Open Definition sets out principles that define “openness” in relation to data and content.
It makes precise the meaning of “open” in the terms “open data” and “open content” and thereby ensures quality and encourages compatibility between different pools of open material.
It can be summed up in the statement that:
“Open means anyone can freely access, use, modify, and share for any purpose(subject, at most, to requirements that preserve provenance and openness).”
Put most succinctly:
“Open data and content can be freely used, modified, and shared by anyone forany purpose”
For all those who served in all countries,
For all those that supported those who went,
For all those that were lost in battle – combatants and non-combatants,
Peace ever first on our minds.
And Big Data used for peace, a rising tide that lifts all boats.
For the US vets and all vets around the world.
One … two … three … four hundred … five hundred thousand …. How many elephants can you count before it’s too many (or too much)?
Counting is one of the first skills learned as a child. Before addition and subtraction, the numerical building blocks of 1-2-3 are right there with the ABCs. Whether it’s your blessings or the dealer’s cards or your money, counting comes in handy. How many or how much is core to decision making. How much money or how many resources do I have? How many does the enemy have?
Counting to 10 or maybe 100 is easy, but as more needs to be counted, it becomes tedious and time intensive. The practice loses its return on energy. That’s where math and probability and statistics come in. Using what can be counted easily can be leveraged not only to count more but also to add value to the meaning of what is counted.
Enter the classic “what to wear” problem (poignant for math geeks).
Instead of laying out each combination and “counting” it, you know how many outfits you have. This simple example can be exploited in far greater combinations…to the nth degree.
But what if you don’t know how many shirts and pants you have?
Counting is another excellent venue for exploring Big Data. Just as math saves hours of manipulating closet items, so Big Data can help with Big Problems, providing greater choices and better decision capability as a result.
Let’s look at a Big counting problem –
A shocking increase in poaching has ripped down African elephant populations. In the past 10 years, the African elephant population has taken a dramatic hit with estimates that 12,000 plus a year have been slaughtered since 2006.
Dr. Mike Chase, director and founder, Elephants Without Borders. “The threat of local extinction feels very real. In October 2013, Elephants Without Borders flew a survey over a park where we had previously counted more than 2,000 elephants. We counted just 33 live elephants and 55 elephant carcasses. That is why this research is so important. http://www.elephantswithoutborders.org
Wildlife preservation is a delicate entente in the best of circumstances, but the lucrative draw of poaching in the myriad of African countries where they habitate has challenged several of its iconic epicenters. Lions, rhinos and elephants are the majestic leaders of a rich wildlife pyramid whose dramatic loss crushes the whole ecologic system, including the native peoples that live and exist off the balance.
Poaching itself is lucrative. The transit from Africa to Asia transforms ivory at $200 to over $2000. Because of international standards outlawing this black market material, poaching profits only illicit activity and most dangerously – terrorism.
South Africa “suffers” from too many elephants. Here the growing numbers continue to roam and forage as is their nature. That means knocking over even the most sturdy of trees and stripping them of the best digestible leaves. Just imagine an elephant walking through your yard or the neighborhood park taking down a couple trees that look tasty. Imagine what a herd of 20-30 can do. They just don’t stand still either. They keep on the move, journeying for miles in a day carving a pachyderm hurricane path.
In any amount, this is nature’s process, culling the forest for new vegetation. Their trails create natural fire breaks and they dig for water which other animals use. But where farm and urban sprawl encroach this roaming territory, it quickly becomes man versus animal. The number of touch points are growing too. The nature of elephants – their survival – is roaming. Their legendary memory too has them cross paths where man’s development has erased the past.
To attempt that delicate balance, game parks in South Africa have taken to birth control and water management methods in order to keep their numbers in check.
Anyone who has tried to count children at a birthday party or getting all students back into a classroom after recess knows the challenges of counting live bodies. Counting crowds is actually a science. And Science isn’t about Knowing so much as Getting a Good Estimate. Here’s how they counted President Obama’s inauguration crowd.
Although a several ton elephant is noticeably slower and harder to miss, expand the search over the wildernesses of half a continent dissected into several countries, some war torn, and accurate counting is hard to imagine. But someone is trying.
That effort is the Great Elephant Census, the largest pan Africa aerial survey since the 1970s, and it’s backed by one of the world’s smart guy-in-the-room icons – Microsoft co-founder Paul Allen. Not only does this count have deep pockets, it also has expert guidance.
The Great Elephant Census is applying a strategic, consistent approach to counting elephants in numerous countries in varying climate and terrain, with an integrated audit program in situ.
The Great Elephant Census is designed to provide accurate and up-to-date data about the number and distribution of African elephants by using standardized aerial surveys of tens of hundreds of thousands of square miles. Dozens of researchers flying in small planes will capture comprehensive observational data of elephants and elephant carcasses. Our standardized method of data collection, which is validated by an independent TAT advisor ensures all data is impartial and accurate.
It’s somewhat like counting the crowds for President Obama’s inauguration. Even with such meticulous effort though, most elephant accounting is predicated on “known” and “estimated” numbers.
So that’s how the experts are counting elephants. Let’s explore counting elephants instead with a Big Data lens. An elephant census doesn’t have to solely be tallying head counts, albeit a magnificent head with flowing ears and strong tusks. The count can be created through a variety of volumes of data that exists already and grows by the minute. In a Big Data Elephant Census, information is created by the community and also serves the communities in return.
Big Data Elephant Census begins with a data lake of information collected from the prevalent sources: cell phone usage, transactional data, weather, heat signature, game warden activity/reports, international shipping and markets, and of course, social media. Big Data ingests the volume, velocity and variety of the data to look for patterns that emerge. Like trying to count moving children, Big Data can exploit information that is too complex for “naked” human observation. Like picking outfits from the mathematically derived wardrobe, Big Data Elephant Census provides an answer to how many elephants as it elicits the holistic picture of what that means.
THE POINT OF COUNTING ELEPHANTS NOT TO KNOW HOW MANY ELEPHANTS THERE ARE. THE POINT OF COUNTING ELEPHANTS IS TO LEARN THE ELEPHANT POPULATION EFFECTS on OUR LIVES – DESIRABLE AND UNDESIRABLE.
The point of counting elephants is not just to know how many elephants there are; we want to know all the factors that evolve in the elephant environment. How does the diverse animal and vegetation habitat ebb and flow with the tramping of elephant feet? How are indigenous and foreign humans influencing and being influenced by the elephant footprint? (har har) How are poaching and anti-poaching efforts impacting the community as well as the elephants? How are farming and native livelihoods affecting and being affected? What other passive economic factors, weather, and politics shift accordingly?
CONSERVATION IS NOT A STASIS …
So let’s stop trying to capture a picture and instead a flow. Pulling Big Data elephant count from a volume, velocity and variety of data sources articulates how the system (man and animal) manifests. Instead of trying to chase the right amount of elephants, Big Data Elephant Counts observe the evolving energy to find the signals in the noise. Gentle shifts or environmental shocks are recorded in situ with all the elements and players. Makes it both predictive as well as evolving and less reactive.
After the disastrous Sichuan earthquake in 2008, people turned to Twitter to share firsthand information about the earthquake. What amazed many was the impression that Twitter was faster at reporting the earthquake than the U.S. Geological Survey (USGS), the official government organization in charge of tracking such events.
This Twitter activity wasn’t a big surprise to the USGS. The USGS National Earthquake Information Center (NEIC) processes about 2,000 realtime earthquake sensors, with the majority based in the United States. That leaves a lot of empty space in the world with no sensors. On the other hand, there are hundreds of millions of people using Twitter who can report earthquakes. At first, the USGS staff was a bit skeptical that Twitter could be used as a detection system for earthquakes – but when they looked into it, they were surprised at the effectiveness of Twitter data for detection.
USGS staffers Paul Earle, a seismologist, and Michelle Guy, a software developer, teamed up to look at how Twitter data could be used for earthquake detection and verification. By using Twitter’s Public API, they decided to use the same time series event detection method they use when detecting earthquakes. This gave them a baseline for earthquake-related chatter, but they decided to dig in even further. They found that people Tweeting about actual earthquakes kept their Tweets really short, even just to ask, “earthquake?” Concluding that people who are experiencing earthquakes aren’t very chatty, they started filtering out Tweets with more than seven words. They also recognized that people sharing links or the size of the earthquake were significantly less likely to be offering firsthand reports, so they filtered out any Tweets sharing a link or a number. Ultimately, this filtered stream proved to be very significant at determining when earthquakes occurred globally.
While I was at the USGS office in Golden, Colo. interviewing Michelle and Paul, three earthquakes happened in a relatively short time. Using Twitter data, their system was able to pick up on an aftershock in Chile within one minute and 20 seconds – and it only took 14 Tweets from the filtered stream to trigger an email alert. The other two earthquakes, off Easter Island and Indonesia, weren’t picked up because they were not widely felt.
On any given day, the NEIC processes about 70 earthquakes, but only a small handful of these might be felt. They might take place in the ocean, deep in the earth, or away from populated areas. Twitter data can be crucial in helping identify earthquakes felt by humans, and can trigger an alert typically in under two minutes. The 2014 earthquake in Napa was detected by USGS in 29 seconds using Twitter data, likely due to the tech savvy population that dominates the area. (Origin time was 2014-08-24 10:20:44 UTC and Twitter data detection time was 2014/08/24 10:21:13.)
The USGS monitors for earthquakes in many languages, and the words used can be a clue as to the magnitude and location of the earthquake. Chile has two words for earthquakes: terremotoand temblor; terremoto is used to indicate a bigger quake. This one in Chile started with people asking if it was a terremoto, but others realizing that it was a temblor.
As the USGS team notes, Twitter data augments their own detection work on felt earthquakes. If they’re getting reports of an earthquake in a populated area but no Tweets from there, that’s a good indicator to them that it’s a false alarm. It’s also very cost effective for the USGS, because they use Twitter’s Public API and open-source software such as Kibana and ElasticSearch to help determine when earthquakes occur.
Next, the USGS team says that they want to determine if they can drop Twitter data based detections into seismic algorithms, and if that can speed up alerts even more.
Thanks to Paul Earle and Michelle Guy of the USGS for taking the time to speak with us.
Big Data expert Patrick Meier provides an excellent explanation of how Big Data can be leveraged for disaster relief. He discusses just enough technical detail so you can appreciate the challenges and yet still understand the premise.
Originally posted on iRevolutions:
Recent scientific research has shown that aerial imagery captured during a single 20-minute UAV flight can take more than half-a-day to analyze. We flew several dozen flights during the World Bank’s humanitarian UAV mission in response to Cyclone Pam earlier this year. The imagery we captured would’ve taken a single expert analyst a minimum 20 full-time workdays to make sense of. In other words, aerial imagery is already a Big Data problem. So my team and I are using human computing (crowdsourcing), machine computing (artificial intelligence) and computer vision to make sense of this new Big Data source.
For example, we recently teamed up with the University of Southampton and EPFL to analyze aerial imagery of the devastation caused by Cyclone Pam in Vanuatu. The purpose of this research is to generate timely answers. Aid groups want more than high-resolution aerial images of disaster-affected areas, they want answers; answers like the number and location of damaged buildings, the number
View original 1,105 more words
To put things in perspective, there are 175 million Americans with at least one mobile device. This means that, in aggregate, since November 2014, the US connected population is spending an extra 125 million hours per day on mobile devices. This growth rate is especially astonishing after seven consecutive growth years.
The Browser: Sidelined
Looking at the chart above, today only 10% of the time spent on mobile is spent in the browser, down from 14% a year ago. The rest of the time, 90%, is spent in apps. Effectively, the browser has been sidelined on mobile. This has major implications on the digital industry in general and the content and media industry in particular. Historically, the media industry has relied almost entirely on search for user and traffic acquisition, building entire teams around SEO and SEM on the desktop web. But search engines are predominantly accessed from a browser. If mobile users aren’t using browsers, the media industry will have to look for new approaches to content discovery and traffic acquisition.
The Media Industry: Absorbed by Apps
The chart below takes a closer look at app categories. Social, Messaging and Entertainment apps (including YouTube), account for 51% of time spent on mobile.
Entertainment (including YouTube) grew from 8% of time spent last year, or 13 minutes per day, to 20% of time spent, or 44 minutes per day this year. This is 240% growth year-over-year, or an extra 31 minutes. That is more than the time it would take to watch an additional TV sitcom for every US consumer, every day!
Messaging and Social apps grew from 28% of time spent last year or 45 minutes per day to 31% of time spent or slightly more than 68 minutes per day this year. This is a 50% year-over-year increase. However, the majority of time spent inside messaging and social apps is actually spent consuming media, such as videos on Tumblr and Facebook or stories on Snapchat. A study by Millward Brown Digital showed that 70% of social app users are actually consuming media. While we can’t correlate the 70% directly to time spent, we firmly believe that media consumption, either articles read in the web view in app, or video consumed in the feeds, constitute the majority of time spent in social apps. This is a big trend and one that will be watched very carefully by traditional media companies. These companies have to adjust to a new world where consumers act as individual distribution channels. The growth in entertainment on mobile proves once again that content is in fact king and is beating the gaming industry in its own game.
The Gaming Industry: Time is Money
The completely unexpected result of our analysis this year is the dramatic decline in time spent for mobile gaming. Gaming saw its share decline from 32% last year (52 minutes per day) to 15% of time spent (33 minutes per day) this year. This is a 37% decline year-over-year. We believe there are three factors contributing to the decline.
What the mobile industry in general and the app industry in particular have achieved in the past seven years is amazing. Flurry now measures more than two billion devices each month and sees more than 10 billion sessions per day. That is 1.42 sessions for every human being on this planet, every day. And that is just Flurry! If there is anything to say about the mobile and app industry it is this: Mobile is on fire and it is showing no signs of stopping.
This info graphic comes from the IBM Center for Applied Insights. It makes me wonder who got paid how much to develop this. Just about anyone off the street could have guessed the numbers and categories and been relatively (and effectively) close. As for Applied Insight, add just about any business case or personal situation, and the info graphic still applies. Just a couple of suggestions …
Actually 600 could be wrong. “Learn what nearly 600 developers have to say about the secret to mobile development project success.” Wow! IBM, really went out on a limb to collect data on this one. I would think Big Blue could’ve scrounged up at least 1,000. Actually, I imagine Google wouldn’t put out any results that had less than six zeros to pull from.
Pulled from a search of “successful mobile application projects” images, this doesn’t tell the same story but it actually tells something worth reading! 10,000 app developers is a much better number. Thank you Developer Economics!!