The highly networked Facebook platform is leaky
The connectedness of the Facebook network means that even a small cross-section of profile information is enough to profile millions of users
With the benefit of hindsight, it is not surprising that the information people gave out in the form of “likes” on Facebook was used by a targeted political campaign. The highly networked nature of the Facebook platform, combined with the data that the company collects meant that even fairly simple machine learning algorithms could be used to craft highly customized political messages.
What Facebook knows about you
When an editor called me to commission this piece last week, he suggested that I download my full Facebook profile and look at all the data that the company has about me. Most of the information there was stuff I’d actually explicitly “told” Facebook—posts, photos, schools, groups, messages on the platform, etc.
And then there was a tab that innocuously called itself “contact info”. This not only had full contact details (including email IDs and phone numbers) of everyone in my Facebook friend list, but also people in my LinkedIn contacts, and people who I’d only ever contacted on email once. Maybe.
In fact, taking a random stroll down this contact info page (it’s massive—you should try it out for yourself), I found people who I barely recognize—some of them are perhaps people who have been in an email thread with me but otherwise in no way an acquaintance. Others are people from my phone contact list from the time I had the Facebook Android app installed (in 2013-14).
Now, if some malicious entity were to get hold of my Facebook profile, they can get their hands on the contact details of more than 3,500 people. And none of these people can do a thing to prevent their contact information from being leaked—all it takes is an act of stupidity on my part.
You are who you know
While it might sound scary enough that one person’s Facebook information can leak contact details of thousands of people, that is only the tip of the iceberg. The bigger problem is that if the Facebook information of a few thousand people gets leaked, it enables the creation of elaborate profiles of a much larger number of people—even if they are actually not on Facebook.
This is based on the principles of network science. Upon obtaining data from a few thousand Facebook accounts, analysts can build a social “graph”, where each person is represented by a node, with a line connecting two nodes where one person is in the contact list of the other.
Now, some nodes are “rich”, in that there is plenty of information known about them such as political leanings, schools, race, location, etc. (these are the people whose profiles have been obtained). Based on the hypothesis that people are likely to correspond with other similar people, this graph can then be used to build profiles of people who we only know in vague terms—perhaps no more than email address or a phone number.
So if your name mostly appears in the address books of people who have been identified as “conservative”, it is highly likely that you are a “conservative” as well. If you are connected to lots of people who are likely to have studied in an IIT, it is likely that you have studied in an IIT as well—these are precisely the kind of algorithms social networks such as Facebook or LinkedIn use to recommend who you should become friends with.
This way, based on “triangulation” of a few thousand accounts, an analyst can build out elaborate profiles of everyone else in their address books. And this information can be then used to craft customized advertising campaigns, which are then sent to these users. (Note that even if these users aren’t on Facebook, their contact information is available.)
When the advertising is for a commercial product, the worst it can do is creep out the recipient with its accuracy. But when the ad is for a political purpose or in order to sway an election, things get much more problematic.
You are what you like
In 2013, a group of researchers from Cambridge University and Microsoft Research Cambridge published a paper which claimed to be able to accurately predict a person’s race, sexual orientation, religion and political leanings, simply based on their Facebook likes, and all this to a high degree of accuracy. Using remarkably simple machine learning models (singular value decomposition followed by regression, in case there are interested readers), they were able to make these predictions with what seems like astonishingly high accuracy. The data for both calibrating and testing this model came from the myPersonality Facebook app, which was installed and used by more than 58,000 “volunteers”.
For someone in the data analytics business (like this author), this is no surprise, since the typical Facebook user farms out her likes across an astonishing variety of topics, pages or posts. While each individual “like” may not give out much information about a user, the beauty of data analytics is that the combination of “likes” can be used to find significant information about a user’s preferences. It is again similar to constructing your social profile based on who you’re connected to. The power comes through aggregation, and intelligent combination of information.
And what is scary here is that the “intelligence” needed to combine the information is not particularly complicated—the methods listed in the Cambridge paper are among the most basic techniques in machine learning.
Moreover, the beauty of machine learning is that once a model has been built and tested based on a mid-sized sample of data (Cambridge Analytica reportedly used a sample of around 50,000 paid respondents), the model can then be applied on publicly available information of a much larger number of people. And with people’s “likes” on pages and news articles being public information (unless explicitly turned off), firms such as Cambridge Analytica can fairly easily build a psychographic profile for a large database of users.
The evolution of Facebook
From the perspective of firms such as Cambridge Analytica, a key event in Facebook’s evolution was the move towards “news first” in 2012. As Wired magazine reported, Facebook turned to news in a big way in 2012 to both combat the rise of Twitter (which was threatening to become the default news destination) and to improve user engagement.
While it might still be possible to deduce a person’s political preferences based on likes of pages and baby pictures, Facebook’s shift to news meant that there was a ready stream of data available to analysts to determine people’s political leanings and opinion on specific issues. With a large number of people liking and commenting on the same set of articles (rather than on individually written posts, each of which would get a small number of likes), the analysts’ task was significantly eased. Moreover, with news articles clearly being identified with political preferences, the number of data points required for an analyst to learn about a user came down significantly.
With the benefit of hindsight, it is likely that Facebook’s move earlier this year towards “meaningful personal interactions” (baby and vacation photos all over again) was in response to the platform’s use for political means in 2016.
What can we do about it?
The short answer is “not very much”, because irrespective of how careful you are, the connectedness of the Facebook network means that even a small cross-section of profile information is enough to profile millions of users.
Epidemiologists use the concept of “herd immunity threshold” to determine the minimum proportion in a community to be vaccinated in order to prevent an epidemic. This is based on how infectious the disease is. For smallpox, this threshold is 80%. For measles, it is 92%.
Given the connectedness of the Facebook network, this threshold is well over 99% for social media. In other words, even if 1% of users are lax about their privacy, the entire population’s details can get compromised.
Deleting Facebook can help in a limited way—in that you’ll stop giving out new information based on your likes, and you won’t receive campaign material through Facebook. However, since the Facebook app collects contact information (and reportedly even call and SMS records), these campaigns are going to hit you anyway.
Editor's Picks »
- For most companies, investors have little say in ratifying auditors
- New way to buy or sell crypto in rupees, but trade at your own risk
- Who’s going to stop the frauds in Indian companies?
- HDFC Bank raises Rs8,500 crore by issuing equity to parent HDFC
- Fadnavis says over 3.2 million farmers given ₹2,337 crore under crop insurance