ON TERM SELECTION FOR QUERY EXPANSION
Date | 01 April 1990 |
DOI | https://doi.org/10.1108/eb026866 |
Published date | 01 April 1990 |
Pages | 359-364 |
Author | S.E. ROBERTSON |
Subject Matter | Information & knowledge management,Library & information science |
DOCUMENTATION NOTE
ON TERM SELECTION
FOR QUERY EXPANSION
S. E. ROBERTSON
Centre for
Interactive
Systems
Research
Department
of
Information
Science,
City
University,
London
EC1V 0HB
In the framework of a relevance feedback system, term values or term weights
may be used to (a) select new terms for inclusion in a query, and/or (b) weight
the terms for retrieval purposes once selected. It has sometimes been assumed
that the same weighting formula should be used for both purposes. This paper
sketches a quantitative argument which suggests that the two purposes require
different weighting formulae.
1.
INTRODUCTION
Term weighting
Various formulae have been proposed or used, at various times, to quantify
the value or usefulness of a search term in retrieval. The motivation or
justification for using a particular formula may be based in a general way on a
qualitative argument concerning the 'value' of the term in the retrieval
context, or may involve a specific quantitative argument such as a proof of
performance.
An example of the latter is provided by the relevance weighting theory [1].
Here it is proved that, under certain assumptions about term independence,
optimum performance is achieved by using a simple sum-of-weights match
function and giving a term t a weight:
where
pt
is
the probability that a given relevant document is assigned the term
t, and qt is the equivalent non-relevant probability (p and
q
may be estimated
from relevance feedback information). The quantitative nature of the
argument
is well
illustrated by
the use
of
'simple
sum-of-weights' together with
the logarithm in the formula: if one were to multiply the weights instead of
adding them, then the same theory would demand that the logarithm
was
not
used. A generalised qualitative argument about term value would not be
capable of distinguishing the two cases.
Journal of Documentation, Vol. 46, No. 4, December 1990, pp. 359-364.
359
To continue reading
Request your trial