Private attribute inference from Facebook’s public text metadata: a case study of Korean users

Date11 September 2017
DOIhttps://doi.org/10.1108/IMDS-07-2016-0276
Published date11 September 2017
Pages1687-1706
AuthorDaeseon Choi,Younho Lee,Seokhyun Kim,Pilsung Kang
Subject MatterInformation & knowledge management,Information systems,Data management systems,Knowledge management,Knowledge sharing,Management science & operations,Supply chain management,Supply chain information systems,Logistics,Quality management/systems
Private attribute inference from
Facebooks public text metadata:
a case study of Korean users
Daeseon Choi
Kongju National University, Gongju,
The Republic of Korea
Younho Lee
Seoul National University of Science and Technology, Seoul,
The Republic of Korea
Seokhyun Kim
Electronics and Telecommunications Research Institute, Daejeon,
The Republic of Korea, and
Pilsung Kang
School of Industrial Management Engineering, Korea University, Seoul,
The Republic of Korea
Abstract
Purpose As the number of users on s ocial network services (SNSs) c ontinues to increase at a remarka ble
rate, privacy and security issues are consistently arising. Although users may not want to disclose their
private attributes, t hese can be inferred from their public be havior on social media. In order to inves tigate
the severity of the leak age of private information in this manne r, the purpose of this paper is to present a
method to infer undisc losed personal attributes of users based o nly on the data available on their public
profiles on Facebook.
Design/methodology/approach Facebook profile data consisting of 32 attributes were collected
for 111,123 Korean users. Inferences were made for four private attributes (gender, age, marital status,
and relationship status) based on five machine learning-based classification algorithms and three
regression algorithms.
Findings Experimental results showed that usersgender can be inferred very accurately,
whereas marital status and relationship status can be predicted more accurately with the authors
algorithms than with a random model. Moreover, the average difference between the actual and predicted
ages of users was only 0.5 years. The results show that some private attributes can be easily inferred from
only a few pieces of user profile information, which can jeopardize personal information and may increase
the risk to dignity.
Research limitations/implications In this paper, the authors only utilized each user s own profile
data, especially text inf ormation. Since users in SNSs are directly or indi rectly connected, inf erence
performance can be impr oved if the profile data of the friends of a given us er are additionally considered.
Moreover, utilizing no n-text profile infor mation, such as profile images, can help inc rease inference
accuracy. The authors can also provide a more gen eralized inference performance if a larger d ata set of
Facebook users is available.
Practical implications A private attribute leakage alarm system based on the inference model would be
helpful for users not desirous of the disclosure of their private attributes on SNSs. SNS service providers can
measure and monitor the risk of privacy leakage in their system to protect their users and optimize the target
marketing based on the inferred information if users agree to use it.
Originality/value This paper investigates whether private attributes of SNS users can be inferred with a
few pieces of publicly available information although users are not willing to disclose them.The experimental
results showed that gender, age, marital status, and relationship status, can be inferred by machine-learning
algorithms. Based on these results, an early warning system was designed to help both service providers and
users to protect the usersprivacy.
Keywords Gender, Facebook, Machine learning, Age, Marital/relationship status, Private attribute
Paper type Research paper
Industrial Management & Data
Systems
Vol. 117 No. 8, 2017
pp. 1687-1706
© Emerald PublishingLimited
0263-5577
DOI 10.1108/IMDS-07-2016-0276
Received 13 July 2016
Revised 5 January 2017
Accepted 12 January 2017
The current issue and full text archive of this journal is available on Emerald Insight at:
www.emeraldinsight.com/0263-5577.htm
1687
Facebooks
public text
metadata
1. Introduction
With the remarkable growth in the use of social network services (SNSs) worldwide in the last
few years, it has become possible to collect and analyze the characteristics and preferences of
users through the history their online behavior, such as mentionsand retweetson Twitter,
or posts and Likeson Facebook (Alowibdi et al., 2013; Mislove et al., 2010; Kosinski et al., 2013).
Commercial organizations can use the result of such analyses to develop customized services,
such as targeted marketing and recommendation systems, whereas law enforcement agencies
may use them for legal investigation (Alowibdi et al., 2013; Kosinski et al., 2013). Furthermore,
the massive amounts of SNS data collected from thecrowdcanbeusedtopredictthefuture.
For example, election results can be predicted considerably accurately based on the volume of
Twitter mentions (Sang and Bos, 2012). It has also been reported that the box office revenues
of movies are highly correlated with the number of SNS posts referring to them; they can also be
a satisfactory indicator of the success of a movie (Asur and Huberman, 2010a; Kim et al., 2015;
Ahmadinejad and Fong, 2014). In the financial domain, the accuracy of predictions regarding
the Dow Jones Industrial Average can be significantly improved by incorporating into the
calculation some measure of public mood over Twitter, in addition to traditional financial
indicators (Bollen et al.,2011).
Although such user-generated data in SNSs are beneficial for various businesses, they
also cause concerns regarding user privacy and security (Butler, 2007; Narayanan and
Shmatikov, 2008; Li et al., 2014). Private information in SNS can be categorized into four
cases according to the policy of SNS providers and the users willingness to disclose his/her
information as shown in Table I. Although some information can be intentionally concealed
by a user on an SNS, such as in Cases C and D in Table I, and regarded as undisclosed
private information, third parties can infer this private, possibly sensitive information from
the users SNS traces, such as medical history or simply the users gender and age, and use it
for their purposes (Asur and Huberman, 2010b; Dey et al., 2012).
This not only violates the users right to the privacy of his/her personal information
(Wright and Rabb, 2014) but may also expose the user to unnecessary discrimination.
For example, a potential employer might refuse to hire the user because of his/her medical
history inferred from the SNS usage data. To make matters worse, once the undisclosed
private information is leaked, it can be illegally used for cybercrimes such as credit card
fraud transactions (Everett, 2010).
There are two main directions of research in private information inference using SNS
data. One involves inferring userspersonal characteristics based on their SNS usage/
behavior patterns, which are mainly studied in the behavioral sciences (Adali and Golbeck,
2012; Chapsky, 2011; Ortigosa et al., 2011). This involves experiments where participants
take personality tests, e.g., the Big Five Test. With user personality as the target variable,
details of the participantsSNS usage over a specified period of time are then gathered and
used to extract the relevant features for the purpose of making inferences. Finally, each
users personality is predicted using inference algorithms, such as a Bayesian Network or a
Decision Tree. A few years ago, inferring user personality was possible if the user
voluntarily undertook a personality test and provided the result. Surprisingly, according to
a recent study by Youyou et al. (2015), building a computer-based system that can assess
ones personality even more accurately than human beings is not possible.
Private information category SNS providers policy
Mandatory Optional
Want to disclose users willingness Case A Case B
Do not want to disclose Case C Case D
Table I.
Private attribute
category
1688
IMDS
117,8

To continue reading

Request your trial

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT