HOW DO WE INDEX?: A REPORT OF SOME ASLIB INFORMATICS GROUP ACTIVITY

Date01 January 1983
Published date01 January 1983
Pages1-23
DOIhttps://doi.org/10.1108/eb026736
AuthorKEVIN P. JONES
Subject MatterInformation & knowledge management,Library & information science
THE
Journal of Documentation
VOLUME 39 NUMBER 1 MARCH 1983
HOW DO WE INDEX?: A REPORT OF SOME
ASLIB INFORMATICS GROUP ACTIVITY
KEVIN P. JONES
Malaysian Rubber
Producers'
Research Association
The Aslib Informatics Group and
its
predecessor the Co-ordinate Indexing Group
have made several attempts to understand the indexing process. This has been
sought through seminars and indexing projects. The seminars produced some data
on an ad
hoc
basis and although most have been assembled they have not been
reported previously. More recently a formal project, involving sixteen volunteer
indexers, has been organized around five short New
Scientist
articles and the data
from this exercise form the major component in the present study. An attempt has
been made to correlate indexer performance with the original texts. There appears
to be evidence to support the assertion that the selection of index entries
is
related
to the structure of the original texts, especially the frequency of individual words.
THE ASLIB INFORMATICS GROUP and its predecessor the Co-ordinate In-
dexing Group (CIG) have made several attempts to understand the indexing pro-
cess more fully. In part, this quest has been associated with the design of thesauri,
as it was considered that it was impossible to design successful thesauri without
understanding the indexing
process.
There have now been
five
projects—both the
earliest and most recent were conducted on an ordinary indexing basis as an in-
dividual activity; the remainder involved a degree of group participation. Only
the results of the first project have been reported.1-2
The activity of indexing, as typified by Collison,3 Knight,4 and to an extent by
Borko and Bernier,5 tends to be concerned with the mechanics of alphabetization,
cross-indexing and the form of name, subject or 'idea' (Collison) index entries.
The relationship between texts and index entries is rarely examined. Knight
avoided this entirely, but Collison incorporated two relatively long textual
extracts together with what he regarded as suitable sets of index entries. Never-
theless, Collison fails to establish an explicit algorithm of how one
is
transformed
into the other. Moreover, his strictures on an example by Holmstrom6 which did
attempt to link text with index entries are illuminating: 'Examples of this kind are
always
(present author's italics) misleading unless related to
the
complete work...'
One would expect a book about the preparation of indexes to contain material on
Journal
of
Documentation,
Vol. 39, No. 1, March 1983, pp. 123.
1
JOURNAL OF DOCUMENTATION Vol. 39, no.
1
alphabetization and other techniques—it
is the
lack
of
any bridge between text
and index entries which
is
strange. Interestingly, Borko and Bernier bridge this
gap
in
a chapter on computer-aided indexing.
THE TIMES INDEXING PROJECT
The earliest project, partially surveyed by Dammers1 and Gilchrist,2 was a large
venture
as it
involved indexing seventy-seven second leaders from
The
Times
published between April and June 1966. Eighteen volunteers indexed all
of
this
material,
a
further one indexed all bar three leaders, and
a
further twenty-seven
indexed only some of the material. Unfortunately, the amount of data produced
exceeded
the
energy available
for
analysis, even with
the aid of
computer
assistance.
A total
of
approximately 22,000 keywords
was
generated.
5,500
different
keywords were produced and 56%
of
these were used uniquely.
It
must be em-
phasized, however, that
no
attempt was made
to
reduce dissimilarities
in
word
morphologies; thus,
governmental,
government
and
governments
were treated
as
three distinct keywords. Neither was
any
attempt made
to
group synonyms.
Therefore,
it is not
surprising that only thirty-nine keywords were used more
than fifty-one times. The most commonly used were
Britain,
government,
China
and
politics
and these were followed by
Rhodesia, Indonesia, Russia,
Vietnam,
United
Kingdom,
Malaysia,
education,
United
Nations,
United States
and
opposition.
With the
exception of the Saturday issues which tended to venture far, most of the second
leaders kept a close eye on the then current political
scene.
Dammers asserted that
indexer proficiency
was
inversely proportional to the number of unique keywords
selected: CIG members used
a
mean
of
approximately seventy unique keywords
per 100 documents, whereas
the
overall mean was approximately 130
and the
non-CIG members were grouped around
a
geometric mean
of
around 220.
The
CIG indexers were regarded as the expert group. This was probably
a
fair assess-
ment,
but at
that time these members were pursuing
a
policy of minimizing in-
dexing vocabularies (as typified in the work of Boyd,7 Rostron8 and Snel9). This
approach has since been
questioned:10-11
therefore, this assertion may also be
of
questionable validity. Unfortunately, inter-indexer consistency within the CIG
indexer
set was not
studied. This would have been more interesting than
the
result that
a
desirable vocabulary
size
of about 300 was required to index the docu-
ment
set. It
must
be
stressed, however, that many indexers would accept this
figure as not being unreasonable for
a
set of seventy-seven short items.
INDEXING SEMINAR 1977
The second project organized after
a
long lapse in 1977 took the form of an index-
ing seminar. The seventeen participants were circulated with copies
of
the texts
prior
to
the meeting and were expected
to
come armed with appropriate index
entries—the seminar was devoted
to
discussing the participants' sets
of
entries.
Data capture was limited
to
recording
by
show
of
hands techniques. The texts
(reproduced
as
Appendices
1 and 2)
were relatively short extracts from books,
consisting
of
three and two paragraphs respectively. One was
a
section
on
lava
taken from Holmes's
Principles
of
Physical Geology—a
well-known textbook. The
other was
an
extract concerning holistic theory from Arthur Koestler's
Beyond
Atomism
and
Holism.
This probably remains the most difficult text to be tackled,
and some participants questioned the value of such
a
text for
a
practical seminar. A
2

To continue reading

Request your trial

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT