h(k)‐private information retrieval from privacy‐uncooperative queryable databases

Date07 August 2009
Pages720-744
DOIhttps://doi.org/10.1108/14684520910985693
Published date07 August 2009
AuthorJosep Domingo‐Ferrer,Agusti Solanas,Jordi Castellà‐Roca
Subject MatterInformation & knowledge management,Library & information science
h(k)-private information retrieval
from privacy-uncooperative
queryable databases
Josep Domingo-Ferrer, Agusti Solanas and Jordi Castella
`-Roca
Department of Computer Engineering and Mathematics,
Rovira i Virgili University, Tarragona, Spain
Abstract
Purpose – This paper aims to address the privacy problem associated with the use of internet search
engines. The purpose of the paper is to propose and validate a set of methods and protocols to
guarantee the privacy of users’ queries.
Design/methodology/approach – In this paper h(k)-private information retrieval (h(k)-PIR) is
defined as a practical compromise between computational efficiency and privacy. Also presented are
h(k)-PIR protocols that can be used to query any database, which does not even need to know that the
user is trying to preserve his or her privacy.
Findings – The proposed methods are able to properly protect the privacy of users’ queries. When
internet users apply the protocols, search engines (e.g. Google) are not able to determine unequivocally
the real interests of their users. The quality of the results does decrease with the increase in privacy,
but the obtained trade-off is excellent.
Practical implications – Current private information retrieval (PIR) protocols suffer from two
significant shortcomings: their computational complexity is O(n) where nis the number of records in
the database, which precludes their use for very large databases and web search engines; and they
assume that the database server cooperates in the PIR protocol, which prevents deployment in real-life
uncooperative settings. The proposed protocols overcome both problems.
Originality/value – This is the first set of protocols that offer practical protection for the privacy of
the queries that internet users submit to an internet search engine. The proposal has been implemented
and it will be released to the general public soon. It will help to protect the right to privacy of millions
of internet users.
Keywords Information retrieval, Search engines,Internet
Paper type Research paper
Introduction
Privacy in databases can be viewed as a three-dimensional property (Domingo-Fer rer,
2007), because a distinction can be made between respondent privacy (privacy of the
respondents to whom the database records correspond), owner privacy (privacy of
proprietary datasets used to conduct joint research between several data-owning
The current issue and full text archive of this journal is available at
www.emeraldinsight.com/1468-4527.htm
The authors are with the UNESCO Chair in Data Privacy, but the views expressed in this article
are those of the authors and do not necessarily reflect the position of UNESCO nor commit that
organisation. Thanks go to Susana Bujalance for her help in implementing the GooPIR prototype
and to U
´rsula Gonza
´lez-Nicola
´s for her help in computing the outcomes of quality measures. This
work was partly supported by the Spanish Government through projects CONSOLIDER
INGENIO 2010 CSD2007-00004 “ARES” and TSI2007-65406-C03-01 “E-AEGIS”, and by the
Government of Catalonia under grant 2005 SGR 00446.
OIR
33,4
720
Refereed article received
20 July 2008
Approved for publication
10 March 2009
Online Information Review
Vol. 33 No. 4, 2009
pp. 720-744
qEmerald Group Publishing Limited
1468-4527
DOI 10.1108/14684520910985693
organisations) and user privacy (privacy of the queries submitted by database us ers).
In the context of interactively queryable databases and in particular, internet search
engines, the most rapidly growing concern is user privacy. This is especially so after
scandals like the August 2006 disclosure by the AOL search engine of 20 million
queries made by 658,000 users (AOL, 2006). It is estimated that over 223 million data
records of US residents have been exposed due to security breaches since January 2005
(Lane et al., 2008), so an attacker could access user profiles stored in the databases of
search engines.
There are personal and corporate motivations for requiring user privacy:
.Private life. Most users feel uncomfortable about being profiled by a sea rch
engine or a database.
.Industrial life. If a company is querying a database containing patents or stock
market quotations, the database manager can partially infer the company’s next
technological or financial moves if he or she knows the queries being submitted
by the company.
Private information retrieval (PIR) was invented in 1995 by Chor et al. (1995, 1998) with
the assumption that there are at least two copies of the same database, which do not
communicate with each other. In those same papers, Chor et al. showed that
single-database PIR (that is, with a single copy) does not exist in the
information-theoretic sense. However, two years later, Kushilevitz and Ostrovsky
(1997) presented a method for constructing single-database PIR based on the algebraic
properties of the Goldwasser-Micali public-key encryption scheme (Goldwasser and
Micali, 1984). Subsequent developments in PIR are surveyed in Ostrovsky and Skeith
(2007).
In the PIR literature the database is usually modelled as a vector. The user wish es to
retrieve the value of the i-th component of the vector while keeping the index ihidden
from the database. Thus it is assumed that the user knows the physical address of the
sought item, which might be too strong an assumption in many practical situations.
Keyword PIR (Chor et al., 1997; Kushilevitz and Ostrovsky, 1997) is a more flexible
form of PIR – the user can submit a query consisting of a keyword and no modificati on
of the structure of the database is needed.
We claim that the PIR protocols proposed so far have two fundamental
shortcomings that hinder their practical deployment:
(1) The database is assumed to contain nitems and PIR protocols attempt to
guarantee maximum privacy, that is, maximum server uncertainty on the index
iof the record retrieved by the user. Thus the computational complexity of such
PIR protocols is O(n), as proven by Chor et al. (1995, 1998). Intuitively, all
records in the database must be “touched”; otherwise, the server could rule out
some of the records when trying to discover i(an implicit standard assumption
in PIR is that no side information is available to the server allowing it to rule out
a touched record). For large databases, an O(n) computational cost is
unaffordable (Beimel et al., 2004).
(2) It is assumed that the database server cooperates in the PIR protocol. However,
it is the user who is interested in his or her own privacy, whereas the motivation
for the database server is dubious. Actually, PIR is likely to be unattractive to
h(k)-private
information
retrieval
721

To continue reading

Request your trial

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT