A method of making better KWOC indexes
Published date | 01 February 1985 |
Date | 01 February 1985 |
DOI | https://doi.org/10.1108/eb044653 |
Pages | 132-144 |
Author | P.J. Elvin |
Subject Matter | Information & knowledge management,Library & information science |
ARTICLES
A method
of making
better
KWOC
indexes
P.J. ELVIN
Building Research
Station
Garston, Watford
WD2 7RJ
[Contributed by courtesy
of the
Director, Building
Research
Establishment and reproduced
by
permission
of
the
Controller,
HMSO.]
Abstract: This article describes a method of making a KWOC index from
article or document titles by the use of a computer program which allows the
indexer to combine words in the title in any way
required
and
also
allows the
use
of words
or
phrases not in the title. The method can be used to make a KWOC
index of titles of articles stored in a database. Examples of the use of the
program are
included.
1.
Introduction
K
eyword Out of Context (KWOC)
indexing is a computer-based
method of indexing which is often
used to index selections of titles of
articles and documents. Words in
each title are chosen as indexing words,
i.e. keywords, their choice being indicated by
tagging them, typically by prefixing them
with an asterisk, the titles are then processed
by a KWOC-indexing program. In its final
form the index consists of all the keywords
arranged in alphabetical order in a column
down the left hand side of the page and,
associated with each keyword, a group,
arranged in a column, of those titles which
contain it. Each title thus appears in the list in
a number of places each one corresponding to
one of the keywords it contains. References
or locations may be accommodated as extra
lines of text with the title or,
as
in the example
given in Figure 1, in a column down the right
hand side of the page.
There are variations on this method. For
example, in one of these each word in
a
title is
treated by the program as
a
keyword unless it
is included in a given list, usually called a
'stop list', which is held in a file on the com-
puter. However, the principle remains the
same whatever the variations and the short-
comings of the method discussed below are
common to all of them.
2.
Shortcomings of KWOC indexes
As far as the indexer is concerned a keyword
is a word in the title which indicates to some
extent the subject dealt with in the associated
article or document. In writing the indexing
program a keyword is identified as a charac-
ter string beginning with a tag and terminat-
ing at the first space encountered; when the
index is printed all the tags are omitted.
A serious limitation of KWOC indexing is
that only single words occurring in the title
can be used as keywords; successive words
can be used together only if they are artifi-
cially joined so as to be treated as a single
word by the program. However, such joining
renders the second of the two words unusable
as an indexing
word.
The only way round
this
difficulty is to use the title twice, once with
the words separate and once with them
joined, but this increases the amount of key-
boarding, perhaps considerably if many titles
contain several such pairs.
The joining of words described is the only
way that indexing terms can be formed from
the words in the title. It may be for example
that the rth and (r + 2)th words in a title
together form a useful indexing term perhaps
even in reverse order; such combinations
cannot be used.
To illustrate these points consider the
following title:
Permanent supplementary artificial light-
ing of deep hospital wards with an estimate
of
costs
in use. No-xxxx
(No-xxxx represents a reference number
which it is assumed is to be used as in
indexing term).
To index this item fairly thoroughly the
indexing terms should include the following:
lighting,artificial lighting, supplementary
lighting,hospital ward lighting,hospital
lighting,permanent lighting,costs, PSALI,
No-xxxx*
The only members of this set which can
be
tagged for inclusion in a KWOC index are:
lighting, No-xxxx.
Other words such as 'hospital', 'wards',
etc.
in this example would of course be tagged
but the depth of indexing represented by the
first set of terms would not be possible and
no-one at present using KWOC would expect
it to be! PSALI could not be tagged unless it
were appended to the title.
The inclusion of phrases such as 'artificial
lighting' could be achieved by rewriting the
title as:
Permanent supplementary artificial.light-
ing of deep hospital wards with an
estimate of costs in use. No-xxxx
where fullstops have been used to 'join' two
pairs of consecutive words, but the single
word lighting cannot then be tagged. In order
to index this title under both 'lighting' and
'artificial lighting' separately it would have
to
be used, i.e. keyboarded, twice.
*PSALI is the accepted
acronym
for
Permanent
Supplementary Artificial
Lighting.
Crown Copyright 1984 — Building Research
Establishment,
Department
of
the
Environment.
132 The Electronic Library, April 1985.
Vol.
3, No. 2.
To continue reading
Request your trial