A PROBABILISTIC SEARCH STRATEGY FORMEDLARS

Pages254-266
Published date01 April 1971
Date01 April 1971
DOIhttps://doi.org/10.1108/eb026520
AuthorWILLIAM L. MILLER
Subject MatterInformation & knowledge management,Library & information science
A PROBABILISTIC SEARCH STRATEGY FORMEDLARS
WILLIAM L. MILLER
University of Strathclyde*
One technique for searching a Co-ordinate Index
is
to compare each refer-
ence with a Boolean expression of index terms. This divides the file into
retrieved and not-retrieved references. An alternative is to assign each
reference
a
score
calculated from
its
index terms and to retrieve the Ν highest
scoring references in the
file.
This scoring technique has several advantages in
theory, and it performed slightly better in
a
retrieval test with N equal to the
number of references retrieved by the corresponding Boolean
search.
In the
test
a
minimum value of
N
= 10 was used, and when less than this number of
references matched the Boolean search requirement, the Scoring technique
successfully widened the scope of the search and retrieved twice as many
relevant references as the Boolean searches.
I. CO-ORDINATE INDEXING AND BOOLEAN SEARCH SPECIFICATIONS
IN ITS SIMPLEST form, Co-ordinate Indexing locates each reference at a
unique point in a space whose co-ordinate directions represent index terms.
This space may be finite dimensioned ('use a restricted vocabulary') or in-
finite dimensioned (e.g. free-language indexing). The co-ordinates are
orthogonal, i.e. the application of one index term to the reference does not
have any implications as to the application of other terms.
When index terms can only be applied or not applied, i.e. when weight-
ing of the index terms is not permitted, the only points at which a reference
may be located arc the vertices of a hypercube—one of whose vertices is at
the origin.
One method of searching a co-ordinate index consists of the retrieval of
all references whose indexing satisfies a Boolean expression of index terms.
The search specification is a set of index terms linked by the logical opera-
tors
and,
or,
and
not.
For example, in an attempt to find papers on the use
of
cows'
milk and goats' milk for purposes other than cheesemaking a user
might specify his requirements by
MILK
and (cow or GOAT) and not CHEESE
The presence of
'not'
in Boolean Expressions used for information re-
trieval has been
criticized.1
It can only be justified if the presence of an index
term is a sufficient reason for not retrieving a reference, irrespective of what
other index terms are attached to it. The subject of interest to the retrieval
* This research was carried out in the Computing Laboratory of the University of New-
castle upon Tyne.
254

To continue reading

Request your trial

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT