Semantic Disclosure Control: semantics meets data privacy

Published date11 June 2018
Date11 June 2018
Pages290-303
DOIhttps://doi.org/10.1108/OIR-03-2017-0090
AuthorMontserrat Batet,David Sánchez
Subject MatterLibrary & information science,Information behaviour & retrieval,Collection building & management,Bibliometrics,Databases,Information & knowledge management,Information & communications technology,Internet,Records management & preservation,Document management
Semantic Disclosure Control:
semantics meets data privacy
Montserrat Batet
Internet Interdisciplinary Institute (IN3), Universitat Oberta de Catalunya,
Barcelona, Spain, and
David Sánchez
Department of Computer Science and Mathematics,
CYBERCAT Center for Cybersecurity Research of Catalonia,
Universitat Rovira i Virgili, UNESCO Chair in Data Privacy, Tarragona, Spain
Abstract
Purpose To overcome the limitations of purely statistical approaches to data protection, the purpose of this
paper is to propose Semantic Disclosure Control (SeDC): an inherently semantic privacy protection paradigm
that, by relying on state of the art semantic technologies, rethinks privacy and data protection in terms of the
meaning of the data.
Design/methodology/approach The need for data protection mechanisms able to manage data from a
semantic perspective is discussed and the limitations of statistical approaches are highlighted. Then, SeDC is
presented by detailing how it can be enforced to detect and protect sensitive data.
Findings So far, data privacy has been tackled from a statistical perspective; that is, available solutions
focus just on the distribution of the data values. This contrasts with the semantic way by which humans
understand and manage (sensitive) data. As a result, current solutions present limitations both in preventing
disclosure risks and in preserving the semantics (utility) of the protected data.
Practical implications SeDC captures more general, realistic and intuitive notions of privacy and information
disclosure than purely statistical methods. As a result, it is better suited to protect heterogenous and unstructured
data, which are the most common in current data release scenarios. Moreover, SeDC preserves the semantics of the
protected data better than statistical approaches, which is crucial when using protected data for research.
Social implications Individuals are increasingly aware of the privacy threats that the uncontrolled
collection and exploitation of their personal data may produce. In this respect, SeDC offers an intuitive notion
of privacy protection that users can easily understand. It also naturally captures the (non-quantitative)
privacy notions stated in current legislations on personal data protection.
Originality/value On the contrary to statistical approaches to data protection, SeDC assesses disclosure
risks and enforces data protection from a semantic perspective. As a result, it offers more general, intuitive,
robust and utility-preserving protection of data, regardless their type and structure.
Keywords Semantics, Knowledge, Privacy, Personal data protection
Paper type Conceptual paper
Introduction
In the current context of information societies, it is quite common to refer to electronic data as
the new oilof the twenty-first century (Rotella, April 2, 2012). On the one hand, the analysis of
personal data fuels many research efforts (e.g. the analysis of medical records is essential to
improve healthcare delivery). On the other hand, predictive market analytics derive value from
the huge amount of personal data being gathered; for example, the compilation, aggregation and
exploitation of data (e.g. social media) related to millions of internet users isa billionaire business
in which Data Brokers are the main providers of data and services, which include identity
verification, marketing products, personal profiling, etc. (US Federal Trade Commission, 2014).
Even though there is no question that those services are of great interest for companies and
consumers, at the same time, the confidential nature of many of the compiled data (e.g. census
data gathered from government sources, personal opinions and preferences posted in social
networks, medical records, etc.) may pose privacy risks to the subjects whom data refer to.
In order to guarantee the fundamental right to privacy of the individuals (The European
Parliament and the Council of the EU, 2016), responsible parties should undertake
Online Information Review
Vol. 42 No. 3, 2018
pp. 290-303
© Emerald PublishingLimited
1468-4527
DOI 10.1108/OIR-03-2017-0090
Received 20 March 2017
Revised 13 June 2017
Accepted 30 August 2017
The current issue and full text archive of this journal is available on Emerald Insight at:
www.emeraldinsight.com/1468-4527.htm
290
OIR
42,3

To continue reading

Request your trial

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT