COMPUTER SEARCHING OF UDC NUMBERS

Published date01 March 1990
Pages193-217
DOIhttps://doi.org/10.1108/eb026860
Date01 March 1990
AuthorA.B. BUXTON
Subject MatterInformation & knowledge management,Library & information science
COMPUTER SEARCHING OF UDC NUMBERS
A.
B.
BUXTON
School
of
Library,
Archive and Information
Studies
University
College,
Goner Street,
London
WC1E 6BT
The Universal Decimal Classification (UDC) is able to provide a detailed
description of the subject content of a document in any area. Its hierarchical
and synthetic structure, which is generally reflected in its notation, should
enable computer searching for hierarchically-related subjects and for the
individual facets of a complex subject. The possibilities of using these features
in automated retrieval are discussed, and attention
is
drawn to
places where
the
UDC
falls short.
A number of online catalogues, databases, and information retrieval
packages are discussed in terms of their ability to allow searching on UDC
numbers. The most sophisticated
ones,
such
as ETHICS
at the
ETH
Library,
use
a
separate file of verbal descriptors linked to the document file through UDC
numbers. Suggestions are made for enhancing retrieval performance on UDC
numbers in simple systems, and for ways in which the classification might be
developed to improve automated searching.
1.
INTRODUCTION
THE LITERATURE on the use of Universal Decimal Classification
(UDC)
numbers for automated retrieval goes back quite a long way. Brisch[1] at the
Royal Society Scientific Information Conference in 1948 described an
'Adaptation of the
UDC
form of notation to punched card technique', and in
the same year Varossieau[2] reported on the 'Use of
the
UDC
in selecting data
with mechanical appliances'. Two seminars on
'UDC
and mechanized retrieval
systems' were organised by FID in 1968[3] and 1970[4]. A review and
bibliography covering the period 19481980 has been produced by Rigby[5].
Now that in-house databases using
micro-
or mini-computer packages, and
online catalogues are becoming increasingly common,
UDC
class
numbers are
becoming quite widely available for interactive searching. This paper
examines the features of
UDC
and of interactive search systems to demonstrate
the potential of
UDC
for computer retrieval, where the problems arise, and
what further work
is
needed from the developers both of search software and
of the
UDC.
Examples of
UDC
numbers are taken from the International
Medium Edition, English Text, BS 1000M
:
Part
1
: 1985[6].
Interactive searching was not foreseen when UDC was first used. The
numbers were intended as a way of ordering and indexing the entries in a
printed bibliography (e.g. the British Universities Film & Video Council
Catalogue,
Meteorological and Geoastrophysical
Abstracts), or as a shelfmark
Journal
of
Documentation,
Vol.
46,
No. 3, September 1990, pp. 193217.
193
JOURNAL OF DOCUMENTATION Vol. 46, no. 3
for ordering the books in a library (e.g. the Scott Polar Research Institute
Library, Cranfield Institute of Technology Library, British Architectural
Library). As an ordering device, the
UDC
number becomes a fairly arbitrary
symbol
at least
as
far
as
most users
are
concerned.
If we know of one relevant
book with the shelfmark
XYZ,
we can go to
XYZ
on the shelves or in the index
and see if there are any more documents classed there. Neighbouring
documents, especially those filed just after XYZ, might well have some
relevance too.
The fact that
UDC
contains so many punctuation characters with special
filing significance is an inconvenience to library shelvers and users (partly
alleviated by the fact that most libraries do not
use
the full
panoply).
As will
be
illustrated, many of the existing retrieval systems and online catalogues allow
only this simple-minded
use
of
UDC
numbers, which
seems a
great waste of the
structure built into
UDC
by its compilers and of the time and effort of the
classifier.
2.
ASPECTS OF SEARCHING
In operational information retrieval systems, the two most common refine-
ments over searching on a single term are
(i)
truncation, and
(ii)
Boolean logic.
Truncation is useful when search terms for related subjects are known to start
with the same stem, for example 'psychologist', 'psychology', 'psychological',
'psychotherapy', 'psycholinguistics', etc. If the truncation symbol on a
particular system is the question mark, then searching for 'psycho?' retrieves
all
these terms at
a
stroke.
At
first
sight this
is a
major advantage of
UDC,
which
has a notation based on a hierarchical structure. (We assume here that
searching is possible only on the notation. If the schedules are held on the
information retrieval system as an online thesaurus we can go beyond the
structure built into the notation. This will be considered in Section 3.3)
Using any classification system should be a help with the problem of
synonyms and related terms. Thus 531.716 covers everyday measuring
devices, industrial measurement, rules, rulers, yardsticks, tape measures,
measuring compasses and
dividers:
a searcher interested in
this
concept would
not have to spell all these terms out individually. It also helps with language
problems. An English-speaking searcher can look in the British
UDC
index
under 'rulers' and find the number
531.716,
while a French-speaking user can
look in the French index under 'regies' and find the same number. The
database containing
UDC
numbers as index terms
is
equally accessible to both.
This facility is used to good effect in the catalogue of the ETH Library in
Switzerland (see Section 3.3.3).
Other advantages of
the
UDC
in particular for subject access are:
(a) It is
universal,
in the sense of covering all areas of
knowledge,
and is
under constant revision to cover new areas.
(b) It allows classification which is coextensive with the subject of the
document, as opposed to other classifications which frequently class docu-
ments at numbers denoting broader subjects. This derives from the
synthetic
194

To continue reading

Request your trial

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT