They are in your Facebook, mining your data …
The issue of privacy on social networking platforms like Facebook has been discussed at length before, and dismissed by most as merely drivel from paranoid folk. My take on the issue has always been one of neutrality — I’m well aware of the ability for private and confidential information to get out into the public domain without your authorisation, but then again, that’s true for almost anything you put on the internet.
Recently, however, I decided to investigate the issue a bit further. Since I am already registered on Facebook, I decided to try writing a Facebook application in order to learn what capabilities such an application has, and to what kind of information the API gives access. The results, although not entirely surprising, were startling nonetheless. I shall summarise some of what I picked up below:
- By default, a Facebook application has access to all the information to which the user of that application has access. The user can, in the privacy settings, choose not to share certain information with applications that he or she doesn’t currently use. This means that, by default, all applications that anyone on your friends list has added has access to all the information that your friends can access (even if you haven’t installed the application).
- Unless you have changed your privacy settings, when you are tagged in a photo by someone (even if you haven’t approved it yet), your photo becomes immediately accessible to all your friends. What’s interesting is that if the creator of the album didn’t restrict access to the album (assuming that only his or her friends can see the album, a reasonable and common assumption), then your friends have access to the entire album in which you were tagged. The same is true if you comment on a photo. Similarly, wall posts and group posts are broadcast to all your friends. Thus, any applications they have installed have access to this information.
None of the above is particularly surprising, but the implications can be. In one scenario, you could be posting a private picture into a restricted album, and then sharing that with your friend. Your friend finds it funny and comments. Suddenly all your friend’s friends can see the photo. What’s more, any applications installed by your friend and your friend’s friends have access to this photo too. This means that any of these applications can download the photo off Facebook’s server and on to their own, and add enough metadata about the photo to make it a useful data-mining effort.
This becomes particularly dangerous when scam or con artists are introduced into the equation. Most people trust strangers who have information that they don’t expect a complete stranger to have, such as what primary school they went to, or which clubs they used to go to when they were at university. In most cases, people are not aware that they are giving out this information (they are just joining a group, attending a reunion event or making a simple comment on a photo). By feeding you this information (“Hey! Remember me? I used to go to school with you at so and so!”), they can get your trust, even if only to get on your friends list (since you feel embarrassed not to remember this “friend” and thus accept the friendship invitation). Once on your friends list, data mining can pick up steam. Maybe you aren’t the target, but a step along the way to the target.
This brings me to the crux of the problem: the reason that your privacy is more of an issue on a social network like Facebook — rather than, say, on a personal home page where you disseminate the same information — is that the social networking platform makes your data incredibly easy and cheap to mine. You are in effect filling in a form on the web where someone asks you to put information about your life into particular fields and tag these with useful metadata, thank you very much, sir! In an age where even your email address can generate revenue for a spammer (and the data miner that sold your email address to the spammer), you can easily begin to see why this is a bad thing.
It gets worse. Heard of OpenSocial? This is an effort to standardise the format in which data is captured and used on social networks so that applications can be written that will work on multiple social networks! So now the effort of mining data from multiple social networks becomes as easy as mining one. Add OpenId to the mix, and you now even have a unique identifier on the internet linking all of your mined data, never mind what would happen if your OpenID password was cracked. We are also entering the era of mashups on the internet, to the point where anyone can point and click to create aggregated feeds using something like Yahoo Pipes. It’s now possible, for (an extreme paranoia) example, to create a “revenge pipe”, which, when clicked on, will publish your name, email address, Facebook and Flickr photos, possibly your physical address with a Google map and satellite photos (by whois-ing your domain name for example), into an extremist website for people testing out their bomb-making skills!
Am I suggesting that everyone stop using social networks? Not at all! I still use Facebook myself (mainly to see photos that my friends or their friends post). It’s a great meeting point on the web — a place where other people can find you. An address, if you will, in prime real estate on the web. Don’t give that up (in fact, sign up to all of them)! Use it to allow people to find you, contact you and see more information about you that is not uploaded to the platform. By that, I mean, put links from your social network to other platforms where you store your public information, such as a photo-sharing site, a video-sharing site, your homepage, your blog and so forth. That data cannot easily be mined from a social networking application.
One caveat here is that popular media-sharing sites (like Flickr) are still susceptible to data mining and spam bots (especially if they have public APIs for mashups), since it is worth their while for the data miners to write the necessary code to mine those sites. Treat these sites as victims of their own success — switch to a less frequently used site. This way, we even solve the problem of one site/company owning most of the data on the internet.