Ontological content‐based filtering for personalised newspapers. A method and its evaluation

Published date28 September 2010
Pages729-756
Date28 September 2010
DOIhttps://doi.org/10.1108/14684521011084591
AuthorVeronica Maidel,Peretz Shoval,Bracha Shapira,Meirav Taieb‐Maimon
Subject MatterInformation & knowledge management,Library & information science
Ontological content-based
filtering for personalised
newspapers
A method and its evaluation
Veronica Maidel, Peretz Shoval, Bracha Shapira and
Meirav Taieb-Maimon
Department of Information Systems Engineering,
Ben-Gurion University of the Negev, Beer-Sheva, Israel
Abstract
Purpose – The purpose of this paper is to describe a new ontological content-based filtering method
for ranking the relevance of items for readers of news items, and its evaluation. The method has been
implemented in ePaper, a personalised electronic newspaper prototype system. The method utilises a
hierarchical ontology of news; it considers common and related concepts appearing in a user’s profile
on the one hand, and in a news item’s profile on the other hand, and measures the “hierarchical
distances” between these concepts. On that basis it computes the similarity between item and user
profiles and rank-orders the news items according to their relevance to each user.
Design/methodology/approach – The paper evaluates the performance of the filtering method in
an experimental setting. Each participant read news items obtained from an electronic newspaper and
rated their relevance. Independently, the filtering method is applied to the same items and generated,
for each participant, a list of news items ranked according to relevance.
Findings – The results of the evaluations revealed that the filtering algorithm, which takes into
consideration hierarchically related concepts, yielded significantly better results than a filtering
method that takes only common concepts into consideration. The paper determined a best set of values
(weights) of the hierarchical similarity parameters. It also found out that the quality of filtering
improves as the number of items used for implicit updates of the profile increases, and that even with
implicitly updated profiles, it is better to start with user-defined profiles.
Originality/value – The proposed content-based filtering method can be used for filtering not only
news items but items from any domain, and not only with a three-level hierarchical ontology but
any-level ontology, in any language.
Keywords Newspapers,Electronic media, Information retrieval
Paper type Research paper
Introduction and problem statement
In information filtering, content-based filtering systems compare representations of
documents with representations of users’ profiles with the aim to find the documents
that are most relevant to a particular user (Belkin and Croft, 1992; Hanani et al., 2001).
This methodology poses the challenge of finding the best representation for both the
documents and the users. In the representation of documents, it is necessary to express
The current issue and full text archive of this journal is available at
www.emeraldinsight.com/1468-4527.htm
The ePaper project was sponsored by Deutsche Telekom Labs at Ben-Gurion University. A short
description of the filtering method and its evaluation was presented at the ACM Conference on
Recommender Systems.
Personalised
newspapers
729
Refereed article received
9 July 2009
Approved for publication
8 March 2010
Online Information Review
Vol. 34 No. 5, 2010
pp. 729-756
qEmerald Group Publishing Limited
1468-4527
DOI 10.1108/14684521011084591
the content and context of each document in the form of an item profile; this
representation is then applied to build a profile of the user who accessed those
documents. The user profile thus represents a mapping of the actual interest profile to a
compact model space, which is an approximation of the user’s real-world interests in the
representation space of the model (Savia et al., 1998). To enable matching between the
user’s profile and the document’s profile, the two profiles must share a common way of
representation, e.g. representation by keywords. The output of the matching process is
expressed in the form of a ranking score, indicating the degree of similarity between the
user’s profile and a given document.
To automatically generate a representation for a document, the document must first
be analysed, possibly with a text analysis algorithm that extracts keywords/terms to
represent the document’s content in the best possible way. This requirement for
analysis is a major drawback of content-based filtering since keyword-based
representation may be ambiguous, as a result of a particular word having more than
one meaning. This problem can be handled by using a domain ontology that provides a
controlled vocabulary of terms (or concepts) and their semantic relationships. Such an
ontology can bridge the gap between terms in the user’s profile and terms used to
represent documents.
The research reported in this paper was conducted in the news domain on the
ePaper system (Shapira et al., 2009) which aims to provide subscribers with a
personalised newspaper on a mobile device. In ePaper, news items from many sources
(news providers) are aggregated, and each subscriber receives a personalised
newspaper on their mobile device containing the news items that are the most relevant
to them. The personalisation component of ePaper combines content-based and
collaborative filtering methods. The reason for the use of two methods is that
collaborative filtering, which relies on the opinions of similar users, is not adequate as a
sole filtering method in the news domain because of the dynamic nature of news, i.e. a
large number of new items emerge at frequent intervals, and these have to be sent to
subscribers as soon as possible even before sufficient opinions can be obtained. This
“cold-start” problem of many new items and too few opinions may be overcome by
using content-based filtering. It is such a content-based filtering method that is used in
the ePaper system and its evaluation that constitute the subject of this paper.
The remainder of this paper is organised as follows. The next section provides
background on content-based filtering and ontology, and the following section reports
related studies on the use of ontology in content-based filtering. The fourth section
presents the ontological content-based filtering method. Then the subsequent two
sections present the experimental evaluations of the method and the results. The last
section summarises the main conclusions from the evaluations, discusses limitations,
and proposes further research issues.
Content-based filtering and ontology
Two main approaches are used in information filtering systems: content-based and
collaborative. Content-based filtering is based on similarity of content (i.e. the user’s
profile and the representation of the documents) while collaborative filtering is based
on the similarity of opinions on documents read in the past. Similarity can be measured
between the active user and other users, or among the items in the set of items that the
active user has read in the past. The two approaches each have particular advantages
OIR
34,5
730

To continue reading

Request your trial

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT