What is unicity & why you need to know

A quick Google of “financial information breach” in the news returns an almost daily litany of public and private sector institutions that have been hacked for information.

In June 2015, the Office of Personnel Management (OPM) announced a breach that compromised over 4 million personnel records. The following month, OPM again confessed to another breach; this times it was over 21 million records, including the files used for security clearances. These specific files incorporate the background investigations which include extensive documentation of personnel employment, health and personal information. (Both instances are blamed on China.)

DON Leadership OPM Data Breach Briefing 2015-06-26
Don't Miss Out - Register Today

Keys to the Castle

Safeguarding personal information is a monumental task. We don’t just take it for granted that the people we give our information – health care, financial institutions, employers – will steward the data properly, we hold them accountable, both in civil and criminal court. It is easy to want an entity to be responsible and answerable to protecting personal information, but in reality, this example is only a simple liability we understand. Your personal information (PI) in reality is a much more complex picture, and infinitely more vulnerable beyond the government and corporate entities that strive to uphold you PI.

Think about your social media data stream. You probably wouldn’t be surprised that someone could figure out who you are by what you post. What does that look like and how easy is it?

The answer is …

Unicity is the a statistical tool used to measure how much “outside” information is need to identify an “anonymous” individual within a dataset. One way to measure that is how many “tuples” it takes to hit the mark. “Tuples”[1] is a “data structure — a mechanism for grouping and organizing data to make it easier to use.” Short for n-tuple or multiple in mathematics, it has n elements to set a data point. In the case of this article, that data point is your identity.

What signal are you sending?

With every purchase using a credit card, the financial transaction is specifically encrypted by sender and receiver to ensure the financial information is sufficient to protect it from hacking. That doesn’t make it anonymous though. Think Big Data.

MIT researchers analyzed a data set of more than one million people at ten thousand businesses.[2] The data was “anonymized”; whereupon, the researchers were able to see details about each transaction, such as when, where, and how much, but not allowed names or account numbers. A tuple of location and time proved a simple solution to identification. With just four of random tuples, the MIT folks was sufficient to reveal 90 percent of the individuals in the dataset.

Cash May Be King …

But it still doesn’t make you anonymous. Your data stream is not confined to credit card transactions. Every geo-tagged photo, every social media comment or use of your phone reveals who and where you are. If you want to brave turning off the virtual world to cloak your movements, you are still followed through license plate readers and shopper movements caught on camera. License plate scans are used on police cars, on bridges, roads and tollbooths to capture time and place. In brick and mortar stores, your movements, attributes and actions are captured on camera, and possibly analyzed.[3] Is that creepy? Possibly, but considering every click on Amazon or every other website on the internet is forever captured by cookies, is there a difference? Or possibly it’s only a difference you are more comfortable understanding … and feeling creepy?

Bottom line: you are rarely alone.


It’s not all about the money either.

“Life is short. Have an affair.” The Ashley Madison website terse tagline speaks terabytes of information about its content. One of many, many sites that provide a covert location to seek others with the same guilty intentions, Ashley Madison made the news in June 2015 as well for being hacked. It’s not the Chinese this time and it’s not ransom for money. The “Impact Team” as the hackers call themselves are demanding the website shut down in return for not releasing the financial (credit card & employment), personal (name and address), and intimate (do I have to draw a picture?) details of the site’s reported 37 million members.

Same old story?

Is this a new phenomenon? Actually, personal accountability, who and where you are and what you do, is not new. Detection, whether a picture of your car license plate or your credit card transaction, has been around for as long as cars and credit cards. Sherlock Holmes and Hercule Poirot understood data trail long before digital medium. (Well, their creators did.)

What is new is Big Data. What has changed is capability of volume, velocity and variety of information that is ubiquitously captured and shared. This aptitude used to be cost prohibitive. The total capture is now relatively inexpensive. Using the data has become a capability differentiator, let alone a potent return on investment.

The data has always been there; it’s just being used faster and funnier. That’s why you need to know unicity and the power of Big Data.


[1] http://openbookproject.net/thinkcs/python/english3e/tuples.html

[2] https://www.sherbit.io/instagram-surveillance/

[3] http://www.nytimes.com/2013/07/15/business/attention-shopper-stores-are-tracking-your-cell.html?_r=0