Security staff and end-uses in many organizations are quickly coming to grasp the concept of aggregation (where the sum total of information in a database is more valuable, and hence a greater risk/higher security classification, than the individual components.
Unfortunately many forget that the data isn’t just in a single database – and “virtual aggregates” which can be assembled by integrating multiple data-sources that have a common reference key (or keys) have the ability to create a much larger security concern – especially when all the data sources are not under a single individual’s or organization’s control.
It is often said that information about your information is often more valuable than the original information itself. In essence, securing “intelligence” and insights from the sum aggregate of data may be more valuable than the individual pieces of original data.
We often see this in security scenarios – when individual data items may be considered unclassified/internal use only – but when gathered today becomes a company secret.
This actually happens more often than people realize – and they fail to secure the aggregate accordingly, not recognizing its inherent value to competitors and outside (and often unfriendly) individuals and organizations.
COMMON DATA – AGGREGATING TO SOMETHING SECRET
Case in point, there is a government agency (who shall remain nameless) that provides key services and controls to the financial industry – thus potentially impacting the entire national financial services infrastructure. Teams that once worked independently were integrated to a common service desk as part of a consolidation/downsizing initiative. Coupled with this was launching a common service desk software suite to track service tickets when providing fixes to the agency’s key systems.
Individual “tickets” were treated as unclassified (and probably designated Protected B by today’s standards) – as there really were no long-term security issues surrounding the outage of any given service and all key stakeholders knew it was broken and that they would be advised when it was restored.
The aggregation of all these tickets however wasn’t unclassified; it was at least SECRET, if not TOP SECRET.
Why, you might ask, if these are all old tickets – and the problems are resolved, is this an issue?
While some insight might be gained from the individual root cause analysis of the tickets may ideas for how to break systems (if that information is contained there) – one could gain more value out of analyzing what teams support which systems, what their capacity and turnaround times are.
The attacker could prepare a series of minor attacks on systems supported by the same team.
By progressively depleting the resources of the team through minor incidents, the attacker could in the end exploit the overburdened team with the main attack on the real target – an essential/critical system that they are now spread too thin to support, and would need to consolidate resources to fix (adding to the confusion and delays).
Essentially the aggregate database could be analyzed to work out a roadmap of attacks to bring down critical systems by knowing the weak link – the number and capacity of technical support teams and the exact systems they support.
If you could slow down or even break infrastructure essential to the operation of the national economy (especially in these trying times) – you have a national level threat that warrants SECRET or TOP SECRET protection.
It’s critical to not just know what data you have (and its value), but also the value that can be derived from the aggregate and the threats it could represent.
While this may be relatively easy to eventually visualize with a single database or databases within an organization where the data aggregates – there is an emerging threat from “virtual aggregates” created when discrete data sets, available both publicly and privately, can be integrated and aggregated through common keys and references.
Virtual aggregates are created when there is a common key (or keys) that can be used to interconnect data that would otherwise be separate and couldn’t be correlated.
One of the largest common keys used within government is the Social Security Number (SSN – USA) or Social Insurance Number (SIN – Canada) for all personal records.
That virtual aggregate may not be of great concern to most citizens as we expect the government to have data about us, our tax records, employment information, etc.
It is concerning, however, when 3rd parties gain access to this information and use it for their purposes; the most common example is through identity theft using multiple social media sites as the “aggregate database”.
A COMMON VIRTUAL AGGREGATE THROUGH SOCIAL MEDIA
With the increase in identity theft and other scams, many of us have been more careful with how much information we share on social media sites – selecting limited use of birthdates (or birthday, but not birth year), home town and other information on just a few sites.
Unfortunately – in the majority of social media sites – the same “common key” is available to all the sites – your name – and it’s not much of a challenge for a scammer to go to multiple sites (or use a meta-search engine) to track down all the individual pieces through the aggregate of databases to complete a full profile of you – getting a birthday from one site, your first born child’s name, and mother’s maiden name from another (perhaps a genealogy site).
Sometimes you aren’t even the direct target – as recently some scammers will look for information indicating you are going on vacation, and will then target friends and relatives with “social engineering attacks” designed to convince them that they (the scammer) are a friend, and are requesting money on your behalf because you’ve been injured/robbed/bail money, etc while away (my friends and family might fall for the bail money scam – but that’s another story).
Now some social media services allow you to have an identifier completely disconnected from your personal identity – but don’t take too much confidence in this feature; many times users will have the same nickname, icon, avatar/picture, etc across multiple services (dating sites, Twitter, YouTube, Flickr, etc).
While this creates a little more work for the scammer, you’ve given them a head start if you use the similar keys are references across all these systems (especially if you also link these systems to more “public” social media platforms such as LinkedIn or Facebook that use your real name).
OTHER PENDING VIRTUAL AGGREGATE THREATS
Social media isn’t the only virtual aggregate threat. With the increasing emphasis on “open data” in government, public and private agencies – many organizations, while attempting to maintain transparency, are also potentially opening themselves or other groups to broader scrutiny and possible attack.
The Open Data Handbook (2010-2012) cites some excellent justifications and examples of how open data can provide better government and governance:
– – – – –
“In terms of transparency, projects such as the Finnish ‘tax tree’
and British ‘where does my money go’ show how your tax money
is being spent by the government. And there’s the example of
how open data saved Canada $3.2 billion in charity tax fraud. Also
various websites such as the Danish folketsting.dk track activity
in parliament and the law making processes, so you can see what
exactly is happening, and which parliamentarians are involved.”
– – – – –
Unfortunately not everyone may use the newly “open data” exclusively for the greater public good, and organizations and special interest groups, with funding and resources much greater than your average government department may be willing to invest significant time and funds into looking for the proverbial needle in a haystack to hold against its opponents.
While that sounds good in principle, as Mark Twain quoted in his autobiography, “there are three kinds of lies; lies, damned lies and statistics”. In many cases statistics and volumes of data have been used to “prove” an argument not readily apparent from the data itself, and as Twain also pointed out “Facts are stubborn, but statistics are more pliable.”
Indiscriminate use of open data, perhaps coupled with some unscrupulous (or incompetent) statistical analysis, could cause significant public and political damage to an individual or entity – often unrecoverable or correctable within a single news cycle or campaign.
PROTECTING YOUR ASSETS AND YOURSELF
Be wary of how much information you share about yourself or your organization. If you consciously restrict information from some public sources, ensure that the same information is readily available elsewhere; good “Google diving” with a search engine and a few common elements to all your profiles will quickly find that information that may be missing on other sites.
Avoid common keys. Where possible, use different user-ids, nicknames and/or avatars on social sites – especially those that require a great deal of personal information (dating again, are we?).
Consider Easter-egging some of your information – such as using a different middle initial with various sites to more readily allow you to source a leak. I’ve known some organizations that put special contacts (with phone numbers and email addresses) in their customer lists. These entries are ignored by internal day-to-day company operations, but if their master database is ever stolen they’ll quickly be alerted when these otherwise unused numbers/addresses are contacted by a 3rd party.
In social media – be careful when friending someone you don’t know, and publicly showing family or other relationships. Not everyone is who they claim to be, and that “friend” may just be using this as the next step to get your information, or get closer to one of your friends (“Hi… we both know Steve – and I thought we could be friends”). Remember – this kind of attack is a battle of attrition and acquisition of small pieces of information; it’s not going to be a single decisive battle between you and someone on Facebook.
Think carefully about what you share; open data (whether in social media, or for your organization) is great in principle – but it is open to abuse and misuse. Think outside the box on how information/open data could be exploited. Conceal personally identifiable information for you, clients or other individuals/organizations if necessary and avoid using the same common key in all the public databases unless it really does serve the greater good for analytical purposes.
Even if you use an obscured key for everyone/everything in your “public”/open databases – you are just a single breach from having all that data readable by a 3rd party unless you maintain a highly controlled cross reference table that matches the unique IDs to actual identifiable entities.
As always, apply diligence and common sense; if you wouldn’t post it on the company notice board, perhaps it shouldn’t be out in the open on the internet.
The Open Data Handbook (2010-2012), Section – Why Open Data? Retrieved June 2nd, 2014, from http://opendatahandbook.org/en/why-open-data/