ON TERM SELECTION FOR QUERY EXPANSION

Date01 April 1990
DOIhttps://doi.org/10.1108/eb026866
Published date01 April 1990
Pages359-364
AuthorS.E. ROBERTSON
Subject MatterInformation & knowledge management,Library & information science
DOCUMENTATION NOTE
ON TERM SELECTION
FOR QUERY EXPANSION
S. E. ROBERTSON
Centre for
Interactive
Systems
Research
Department
of
Information
Science,
City
University,
London
EC1V 0HB
In the framework of a relevance feedback system, term values or term weights
may be used to (a) select new terms for inclusion in a query, and/or (b) weight
the terms for retrieval purposes once selected. It has sometimes been assumed
that the same weighting formula should be used for both purposes. This paper
sketches a quantitative argument which suggests that the two purposes require
different weighting formulae.
1.
INTRODUCTION
Term weighting
Various formulae have been proposed or used, at various times, to quantify
the value or usefulness of a search term in retrieval. The motivation or
justification for using a particular formula may be based in a general way on a
qualitative argument concerning the 'value' of the term in the retrieval
context, or may involve a specific quantitative argument such as a proof of
performance.
An example of the latter is provided by the relevance weighting theory [1].
Here it is proved that, under certain assumptions about term independence,
optimum performance is achieved by using a simple sum-of-weights match
function and giving a term t a weight:
where
pt
is
the probability that a given relevant document is assigned the term
t, and qt is the equivalent non-relevant probability (p and
q
may be estimated
from relevance feedback information). The quantitative nature of the
argument
is well
illustrated by
the use
of
'simple
sum-of-weights' together with
the logarithm in the formula: if one were to multiply the weights instead of
adding them, then the same theory would demand that the logarithm
was
not
used. A generalised qualitative argument about term value would not be
capable of distinguishing the two cases.
Journal of Documentation, Vol. 46, No. 4, December 1990, pp. 359-364.
359

To continue reading

Request your trial

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT