PROBLEMS IN ANALYSIS AND TERMINOLOGY FOR INFORMATION RETRIEVAL

DOIhttps://doi.org/10.1108/eb026380
Date01 April 1965
Pages287-290
Published date01 April 1965
AuthorJ. FARRADANE,R.K. POULTON,MRS S. DATTA
Subject MatterInformation & knowledge management,Library & information science
PROBLEMS IN ANALYSIS AND TERMINOLOGY FOR
INFORMATION RETRIEVAL
J. FARRADANE, R. K. POULTON, AND MRS S. DATTA
Northampton College
of
Advanced Technology
FOR THE STORAGE of information for subsequent retrieval of desired
items,
two stages of analysis are essential. The first is the determination of
the subject content of a given article or paper; the second
is
the selection of
certain words, groups of
words,
or classification headings by which the
subject content
is
to be represented, either directly or by a suitable coding.
Some workers still look forward to the day when it will be possible for the
whole of a text to be read and 'understood' automatically by a machine;
the 'understanding' process has been envisaged either as a process of
selec-
tion of terms by the measure of word frequency, or word-pair frequency
(adjacent terms or terms not too far separated in one sentence) in the text, or
by some process of automatic linguistic analysis. Such methods appear un-
suitable for several reasons: language, as normally used, is a very difficult
medium for exact expression (hence the value of mathematics) and few
authors write well enough to avoid all ambiguities; a human reader ac-
customed to the subject can easily overcome any difficulties due to poor
grammar, badly expressed
arguments,
excess
brevity or prolixity in writing
and even, sometimes, actual errors; a machine can
not
do so. Furthermore,
the content of a paper
is
rarely of uniform importance throughout, and it
is
not worth recording, for subsequent retrieval, details which are merely
repetitions of matters described earlier and better elsewhere, and not essen-
tial to the main purpose of the paper; for example, in a paper on evaporator
design, a description of a standard method of analysis, applied to the con-
tents of the evaporator in determining the efficiency of the design, will not
be worth indexing; in a search for analytical methods, retrieval of such a
paper would hardly be considered pertinent. A human reader, though far
from infallible, can usefully make such judgments.
This does not imply that it is easy to establish the essential content of a
paper, or to obtain agreement between two readers about it. The different
backgrounds and interests of readers will always tend to
bias
their selection.
Some
loss
of information, or
bias,
is
however inescapable in any method of
indexing; even the original author will have introduced some bias of view-
point, not to be overcome by machine reading. Furthermore, the future
287

To continue reading

Request your trial

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT