THE PROBABILITY RANKING PRINCIPLE IN IR

Pages294-304
Published date01 April 1977
Date01 April 1977
DOIhttps://doi.org/10.1108/eb026647
AuthorS.E. ROBERTSON
Subject MatterInformation & knowledge management,Library & information science
THE PROBABILITY RANKING PRINCIPLE IN IR
S. E. ROBERTSON
School
of
Library,
Archive,
and Information
Studies,
University College London
The principle that, for optimal retrieval, documents should be ranked in
order of the probability of relevance or usefulness has been brought into
question by Cooper. It is shown that the principle can be justified under
certain assumptions, but that in cases where these assumptions do not hold,
the principle is not valid. The major problem appears to lie in the way the
principle considers each document independently of the rest. The nature of
the information on the basis of which the system decides whether or not to
retrieve the documents determines whether the document-by-document
approach
is
valid.
A REFERENCE retrieval system should rank the references in the collection in
order of their probability of relevance to the request, or of usefulness to the user,
or of satisfying the user. This principle was first used explicitly by Maron and
Kuhns.1 Given that no system is capable of making a definitive assessment of
relevance, it seems intuitively obvious that some such notion must be used;
Maron and Kuhns accept the principle
a
priori.
However, a closer analysis of the
principle suggests that we need to examine carefully the assumptions on which
it
is
based and the ways it might be interpreted. The object of this paper
is
to make
a first attempt at such an analysis.
I. BACKGROUND
Maron and Kuhns's early paper introduced a very necessary new idea into dis-
cussion on the basic problems of
retrieval.
The idea was that since no retrieval
system can be expected to predict with
certainty
which documents a requester
might find useful, the system must necessarily be dealing with
probabilities;
we
should therefore design our systems accordingly.
That said, the particular approach adopted by Maron and Kuhns in some ways
confuses the issue
(see
Robertson).2 They
define
the relevance of a document to an
index term
as
the probability that a user using this term will be satisfied with this
document: a definition which does not correspond to the usual use of the word
'relevance'.
In this paper, I will take relevance (or usefulness, or user satisfaction) to be a
basic,
dichotomous criterion variable, defined outside the system
itself.
The
assumption of dichotomy is a strong one, and almost certainly not generally
valid; discussions of a more complex model are given elsewhere.2,3 Several more
possible assumptions about the nature of this criterion variable are discussed below.
Given a dichotomous criterion variable, and a system which has some (essen-
tially probabilistic) information about this variable, it seems obvious enough that
the documents which are most likely to satisfy the user should be presented to
Journal
of
Documentation,
Vol.
33,
No. 4, December
1977,
pp. 294-304
294

To continue reading

Request your trial

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT