A framework for work task based thesaurus design

Pages774-797
Date01 December 2001
Published date01 December 2001
DOIhttps://doi.org/10.1108/EUM0000000007100
AuthorMarianne Lykke Nielsen
Subject MatterInformation & knowledge management,Library & information science
A FRAMEWORK FOR WORK TASK BASED THESAURUS DESIGN
MARIANNE LYKKE NIELSEN
mln@db.dk
Institute of Information Studies, Royal School of Library and Information
Science, Langagervej 4, DK 9220 Aalborg Ost, Denmark
Design and construction of indexing languages require thorough
knowledge and understanding of the information environment. This
empirical study investigated a mixed set of methods (group
interviews, recollection of information needs and word association
tests to collect data; content analysis and discourse analysis to
analyse data) to evaluate whether these methods collected the data
needed for work domain oriented thesaurus design. The findings
showed that the study methods together provided the domain
knowledge needed to define the role of the thesaurus and design its
content and structure. The study was carried out from a person-in-
situation perspective. The findings reflected the information
environment and made it possible to develop a thesaurus according
to the characteristics of the work domain. It seemed more difficult
to capture the needs of the individual user and adapt the thesaurus
to individual characteristics.
1. INTRODUCTION
The starting point for the present investigation was a real-life problem situation.
In relation to a document management project a large company considered devel-
oping a company-specific thesaurus to support exchange and retrieval of docu-
ments across the organisation. No-one in the project team had a clear idea what
kind of thesaurus to develop, the topical focus of the thesaurus was not defined,
the sources and methods to use were not known etc. As recommended in the clas-
sical literature on thesaurus construction (Soergel, 1974; Lancaster, 1986;
Aitchison et al., 1997) it was decided that a study should be carried out to inves-
tigate what kind of conceptual tool would be appropriate to secure retrieval. A
thesaurus was on the agenda, but the design was to be based on the results of the
domain study.
This paper presents the design and results of this empirical study. The aim is to
examine what aspects or variables to investigate, what methods to use for collect-
ing the necessary data, what methods to use for analysing data, and how to trans-
late the results to thesaurus design. In particular, the aim is to compare different
methodologies in order to provide guidelines for the development of domain stud-
774774
Journal of Documentation, vol. 57, no. 6, November 2001, pp. 774–797
ies. The paper evaluates a mixed set of methods: group interviews, recollection of
information needs and word association tests to collect data, and content analysis
and discourse analysis of the collected data.
The following research questions are examined:
(1) What knowledge is needed to plan and design a thesaurus?
(2) What variables of the information environment are to be investigated in
order to gain the knowledge needed?
(3) What kind of knowledge is produced by the methods under investigation?
(4) What are the implications for thesaurus design?
FThe paper is structured as follows: the case study is presented in Section 2;
preliminary works are presented in Section 3; Sections 4 and 5 discuss the role of
thesauri and what variables to investigate in a domain study; the research design
is described in Section 6; the findings are reported in Section 7; and the discussion
and conclusions are in Section 8.
2. THE CASE STUDY
The field study took place in a large product development company, working
within the pharmaceutical industry. The project studied concerns the development
of a document system to handle electronic medical submissions to legal authori-
ties. The primary goal of the system is the production of submissions, but anoth-
er important objective is to gather together documents about the company’s
products in order to facilitate exchange of information about products, test sys-
tems, techniques, methods etc. The project was initiated in 1998 and the project
staff was organised in four groups, each taking care of a specific aspect: flow dia-
grams, implementation, word templates and publishing. Decisions concerning
indexing and retrieval came under the project group dealing with flow diagrams.
The thesaurus project was run as an independent project with a thesaurus manag-
er, who had overall responsibility, and an external adviser. Decisions regarding
indexing policy and thesaurus were placed in two different project groups and the
scope of the thesaurus could potentially be wider than merely supporting the doc-
ument management system, depending on the study results.
Some criteria were set beforehand. Different types of documents, coming from
different departments, are stored in the document system: research reports, test
reports, standard operating procedures (SOPs), articles and statistics. The com-
mon subject field of pharmaceuticals is treated from different angles in the docu-
ments: experimental, regulatory, clinical, non-clinical etc. A mix of novice and
expert indexers index the documents according to a set policy. Thus, due to the
topical and structural complexity of documents and the varied competence of the
indexers, it is difficult to produce document representations of consistent quality
representing all aspects and topics of the documents. As a consequence, the pro-
ject staff had an idea beforehand that the thesaurus should support controlled
as well as full-text searching. The objective of the domain study was to gather
November 2001 THESAURUS DESIGN
775

To continue reading

Request your trial

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT