The identification of distinguishing term characteristics from relevance feedback

Date07 August 2009
DOIhttps://doi.org/10.1108/14684520910985701
Pages745-760
Published date07 August 2009
AuthorShihchieh Chou,Weiping Chang
Subject MatterInformation & knowledge management,Library & information science
The identification of distinguishing
term characteristics from
relevance feedback
Shihchieh Chou
Department of Information Management, National Central University,
Chung-Li, Taiwan, and
Weiping Chang
National Central Police University, Kueishang, Taiwan
Abstract
Purpose – The purpose of this paper is to identify distinguishing term characteristics from among
the information of term appearance situations (tas) residing in the relevant/irrelevant documents
retrieved for use. Terms with specific characteristics could be used in the distinguishing of user
profiles, documents, pages or concepts to assist in information retrieval.
Design/methodology/approach – First, a method to apply the potential term characteristics in the
distinguishing of user profiles in the information retrieval environment is designed. Then, an
information retrieval system is developed to demonstrate the realisation and sustain the study of the
method. Formal tests are conducted to examine the distinguishing capability of the potential term
characteristics proposed in the method.
Findings – The results of the tests show that the potential term characteristics proposed in this study
are successfully applied in the distinguishing of user profiles in the information retrieval environment.
Originality/value – Identification of distinguishing term characteristics would expand the ground
for the IR community in the design of feature-extraction algorithms or systems that try to cull
information from structured or unstructured documents.
Keywords Feedback, Information retrieval
Paper type Research paper
Introduction
Terms with specific characteristics have been widely used in the distinguishing of user
profiles, documents, pages or concep ts. Therefore, identification of the term
characteristics capable of doing this distinguishing work is valuable to related
studies and applications. Frequently mentioned and used term characteristics are:
frequencies, title/head wording, sentence location and cue indication in the classic al
measure; or webpage tags in recent studies (Edmundson, 1969; Fresno and Ribeiro,
2004; Ou et al., 2008). In this paper, we are interested in the identification of these
distinguishing term characteristics from among the information contained in relevance
feedback in the vector-space-modelled informati on retrieval environment. The
information of interest is the term appearance situations (abbreviated as tas)
residing in the rated relevant/irrelevant documents as follows:
(1) A term can appear in relevant documents only and never appear in irrelevant
documents (termed as tas 1).
The current issue and full text archive of this journal is available at
www.emeraldinsight.com/1468-4527.htm
Distinguishing
term
characteristics
745
Refereed article received
9 August 2008
Approved for publication
24 November 2008
Online Information Review
Vol. 33 No. 4, 2009
pp. 745-760
qEmerald Group Publishing Limited
1468-4527
DOI 10.1108/14684520910985701
(2) A term can appear in irrelevant documents only and never appear in relevant
documents (termed as tas 2).
(3) A term can appear both in relevant and irrelevant documents (termed as tas 3).
These individual bits of tas information are all term characteristics, and among them,
tas 1andtas 2 could be the potential term characteristics in the distinguishing of user
profiles, documents, pages or concepts as we consider the appearance deviations of
them.
In the past, many investigations have been conducted to study the application of the
information contained in relevance feedback in order to achieve more effective retrieval
(Rocchio, 1966; Buckley and Salton, 1995; Kim et al., 2001; Koster and Beney, 2007).
These studies usually first had the user rate the retrieved documents as relevant or
irrelevant. Then two sets of terms would be derived from the relevant/irrelevant
documents to form two vectors, and the two vectors together with the vector merging
operation (based on addition and subtraction according to term relevance) were used to
expand the initial query vector. In these studies, the term characteristics of tas 1, tas 2
and tas 3 were not independently identified and manipulated. Usually, they were
under-utilised by the relevant/irrelevant classification of documents and worked
together to show the combined effect on the enhancement of retrieval effectiveness.
Since they were not independently identified and studied, the feasible method and the
effect of their applications on the distinguishing of user profiles, documents, pages or
concepts were not clear.
To enable better study of tas information, the research reported here aimed first to
design a method to independently apply potential term characteristics of tas (tas 1 and
tas 2) in the distinguishing of user profiles in the information retrieval environment,
then to develop an information retrieval system to demonstrate the realisation and
sustain the study of the method, and finally to conduct some formal tests on the
developed information retrieval system in order to examine the distinguishing
capabilities of the potential term characteristics of tas as proposed in the method.
Applications of the information containing tas
Traditionally, tas had been under-manipulated by the relevant/irrelevant classification
of the feedback information for study in the vector-space-modelled information
retrieval environment. The best known and very first study on the application of
relevant/irrelevant information to improve query performance was conducted by
Rocchio (1966). The principle of Rocchio’s study was to adjust a query vector according
to the user’s relevant/irrelevant rating for the documents retrieved. The original
formula proposed by Rocchio was:
Q1¼Q0þ
b
X
n1
k¼1
Rk
n12
g
X
n2
k¼1
Sk
n2
where:
Q
1
: new query vector
Q
0
: initial query vector
OIR
33,4
746

To continue reading

Request your trial

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT