Multivariate statistics applied to the evaluation of environmental and chemical data sources

Pages116-123
DOIhttps://doi.org/10.1108/14684520010330300
Published date01 April 2000
Date01 April 2000
AuthorKristina Voigt,Gerhard Welzl,Gerda Rediske
Subject MatterInformation & knowledge management,Library & information science
Multivariate statistics
applied to the
evaluation of
environmental and
chemical data sources
Kristina Voigt
Gerhard Welzl and
Gerda Rediske
Introduction
Chemistry and the environmental sciences are
scientific disciplines with an enormous output
of and demand for data. By 13 October 1999,
21,679,848 substances had been registered in
the Registry File of the Chemical Abstracts
Service (CAS, 1999). Presently the chemical
literature grows by approximately 500,000
publications per year. Since there is no
indication that the information increase in
these fields will slow down within the
foreseeable future, we shall have to cope with
a growing flood of chemical and
environmental information. A scientific
approach is urgently needed to deal with this
information avalanche (Luckenbach, 1996).
The enormous increase in chemical and
environmental information implies a rise in
online databases, CD-ROMs and Internet
resources for these fields. With an estimated
201 million Internet users in September 1999
(NUA, 1999), many people have the tools to
use these data sources and the problem arises
of where to find the information wanted. To
answer this question a management strategy is
needed in handling the great variety of data
sources.
In this paper two information management
steps will be followed:
(1) collecting, structuring the environmental
and chemical information sources in so-
called metadatabases;
(2) evaluating the contents of these
metadatabases by means of multivariate
statistical methods.
Metadatabases for environmental
chemicals
The world's best known directory of
commercially available data sources is the
Gale Directory of Databases. It comprises over
11,000 databases accessible in a variety of
computer-readable formats (Williams, 1999).
Environmental and chemical databases are
The authors
Kristina Voigt and Gerda Rediske work with Gerhard
Welzl, the Head of Biostatistics at the GSF National
Research Centre for Environment and Health Institute for
Biomathematics and Biometry, Neuherberg, Germany.
Keywords
Databases, Chemicals, Measurement, Internet,
Information management, Knowledge-based systems
Abstract
Constantly expanding chemical and environmental
information sources increase the need for descriptive
statistical analysis. This paper gives a comparative
evaluation of data sources, i.e. online databases,
databases on CD-ROM and Internet resources in the field
of environmental chemicals. The evaluation is based on
information in three metadatabases for environmental
chemicals: DADB-Metadatabase of Online Databases,
DACD-Metadatabase of CD-ROMs, DAIN-Metadatabase
of Internet Resources. A data matrix of 50 environmental
and chemical descriptors found in DADB, DACD and DAIN
is analysed and a technique is applied to transform the
data set into a data matrix of a more homogeneous
structure. This method is based on algorithms for solving
the so-called travelling salesman problem. Two different
ways of analysing the data set are applied and the results
are compared. Also, media combination patterns are
identified and discussed. For most descriptors the
information depth is higher in commercial online
databases and databases on CD-ROM than in free
Internet resources. Exceptions, e.g. some health-related
parameters which have a higher percentage in Internet
resources, are identified and explained.
Electronic access
The current issue and full text archive of this journal is
available at
http://www.emerald-library.com
We wish to thank Hannelore Guth for fruitful
discussions concerning the style of this paper. This
work was partly supported by the Bayerisches
Staatsministerium fuÈr Landesentwicklung und
Umweltfragen (Bavarian State Ministry for State
Development and Environmental Affairs) in
Germany.
Received October 1999
Accepted February 2000
116
Online Information Review
Volume 24 .Number 2 .2000 .pp. 116±123
#MCB University Press .ISSN 1468-4527

To continue reading

Request your trial

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT