Scientific information supply — Building networked information systems

Pages317-332
Date01 April 1996
Published date01 April 1996
DOIhttps://doi.org/10.1108/eb045487
AuthorLorcan Dempsey,Maria Heijne
Subject MatterInformation & knowledge management,Library & information science
Focus Article
Scientific information supply
building networked information
systems*
Lorcan Dempsey
UK Office for Library and Information Networking (UKOLN),
University of
Bath,
Bath BA2 7AY, UK
L.Dempsey@bath.ac.ukk
Maria Heijne
SURFnet
bv,
Utrecht,
the Netherlands
E-mail:
Maria.Heijne@SURFnet.nl
1.
Background
Materials for teaching, learning and research are moving into the digital sphere.
This move is affecting scholarly communication, teaching and learning in the aca-
demic community in important ways. These are significant changes, operating at
technical, service, organisational and cultural levels. In this paper
we
wish to exam-
ine some of these changes. However, our aims are
modest:
we will focus on techni-
cal developments and on some of the emerging services that these are making
possible.
Although networked information
systems have received much recent at-
tention, the technologies being de-
ployed are still immature in many
ways.
A number of building blocks are
emerging: we suggest how they will be
combined to provide the desired serv-
ices.
There will be some emphasis on
standards, for several reasons. We are
in a construction phase when, typi-
cally, standards receive more atten-
tion. Unlike traditional information
systems, networked systems crucially
depend on standards. They will be dis-
tributed, computer-based systems: the
flows of information, the creation of
communicating applications, and the
construction of integrated end-user en-
vironments will not work without a
constrained set of protocols and agreed
ways of structuring information for
exchange and processing, which in
turn provide the necessary infrastruc-
ture for more advanced automated
services. The creation of large-scale
distributed information systems in an
environment of heterogeneous re-
sources is still a matter for research
and development.
Although the creation of informa-
tion systems will do much to change
the way researchers and students com-
municate, learn and do their research,
they are only one part of a larger set of
changes which
are as
yet poorly under-
stood. Again, examination of these
larger issues is beyond our scope here.
1.
1.
Some terms
It might be useful to introduce some
terms and distinctions at
this
stage.
The first distinction
is
between a
re-
source and a resource space. Network
resources include files (documents,
software, images), interactive data-
base services (catalogues, directories,
etc.),
statistical and scientific data sets,
complex multimedia resources, and
other evolving services. In fact, we
still have a rather limited notion of
what a resource is, conditioned by the
early file-based Internet
systems.
A re-
source may be a database or a record
within a database, a file archive or a
file stored in an archive, and
so
on.
Re-
sources may be created dynamically in
response to particular queries or agent
action; they may have a temporary or
opportunistic existence; they may
change continually; they may be re-
lated to each other in
a
variety of
ways.
Increasingly resources represent a
complex of data, services and relation-
ships.
Of course, books or journal arti-
cles are also resources. In the context
of network information discovery and
retrieval (NIDR) systems the idea of a
'resource space' was given currency.
A resource space was defined by
an
ac-
cess protocol like gopher or FTP. For
example, one could discuss a Z39.50
resource space which comprises those
resources accessible through Z39.50.
The first resource spaces have now
been very much subsumed within the
World Wide Web, separately identi-
fied by the scheme part of the URL
(Uniform Resource Locator). How-
ever it should
be
remembered that very
many resources are not part of the re-
source space defined by the Web.
*
This
article is a version of a report prepared for
SURF,
the cooperative organisation for the
advancement of computer services in higher education and scientific research in the Neth-
erlands. It
was
published by
SURF,
in Dutch
in
Autumn 1995.
The Electronic
Library,
Vol.
14,
No.
4,
August 1996 317
Focus:
library networking
"The Information Superhighway is about
the 'global movement of weightless bits at
the speed of light'"
The second distinction is that between 'bits' and 'atoms',
introduced by Nicholas Negroponte (1995) in his much-no-
ticed recent book, Being Digital. Libraries, for example, cur-
rently handle atoms individual physical items which need
to be created, packaged, transported, distributed and fetched.
The items have
mass
and have
to be
duplicated
massively.
The
Information Superhighway, he suggests, is about 'bits'; about
the 'global movement of weightless bits at the speed of light';
about, in fact, being digital.
We might think of a spectrum from 'dumb bits' to 'smart
bits',
depending on their closeness to the atom. The paper
journal article is atomic its content is only available to the
human reader. It can be converted to dumb bits by being
scanned this will allow it to be exchanged more easily, but
its content is still only available to the human reader. It is
'atomic' to the program which transfers or displays it. If it
were OCRed (scanned with Optical Character Recognition),
it might become slightly less dumb bits, allowing some
searching. It could be converted to smarter bits by being
tagged, perhaps using HTML (Hypertext Markup Language).
This would allow it to be searched, but also to be presented
and displayed in a variety of
ways.
It is more open to process-
ing. However, an application still knows little about what it is
about: it still only releases its content to
a
human
reader.
If the
article were tagged using SGML (Standard Generalised
Markup Language) then it would become smart
bits.
Applica-
tions could be built which understood more of its content and
exploited the semantic information conveyed by the tagging.
Of
course,
it is slightly misleading to say that a database of
SGML-encoded articles is smart, but it does allow an applica-
tion to do smart things with it. It can be indexed and searched
on particular
tags,
or output selectively to various media, and
is amenable to a wider range of processing. Take a simpler
example. E-mail messages have structured headers which
convey semantic information. Messages can be sorted and
displayed in various ways. They can be filtered depending on
who they
are
from, what they
are
about and so
on.
Smart appli-
cations can act on smart bits.
The third distinction is between metadata and the re-
sources which they describe. This
is
not always
a
clear distinc-
tion but it is a useful one to make. Metadata is data which
describes a resource. It is of various types and levels of full-
ness.
Here, it is used inclusively to refer to names, locations
and descriptive data which facilitate discovery, selection and
location of resources. In some cases, the metadata may be no
more than a file name and location; in others in library
systems, for example structured descriptive data may be
created manually. Some systems provide access to metadata
only: they are oriented towards discovery and selection. Ex-
amples are Archie, Lycos or an online catalogue. Metadata
will underpin new resource discovery systems which are an
essential prerequisite to using a network, which is increas-
ingly rich in resources, which in turn are subject to different
conditions of
use,
whether for technical, commercial, legal or
other reasons.
1. 2. Overview
We have broken up the rest of this paper in such a way that the
paper falls
into two
parts.
The first (Sections
2
and
3)
concerns
data and metadata, the resources of interest to users; the sec-
ond (Sections 4,
5
and 6) concerns the applications which will
allow users to discover and access those resources, and have
them delivered. A closing section briefly looks at some issues
for the future. An Appendix lists the acronyms used in this
paper, for quick reference.
2.
Document issues
In relation to the network environment, electronic publishing
is often limited to the production of electronic versions of
paper originals, and issues occur largely in the area of storage
and presentation of text formats in combination with graphics.
Publishers still prefer production of CDROM or CD-I when
they want to exploit all the possibilities that multimedia
has
to
offer.
During the last couple of years the emergence of electronic
journals has required a lot of attention. One sees all kinds of
presentation formats, varying from plain ASCII with access
through
e-mail
to combinations of text, graphics and sound
via the World Wide Web. It is an area where policy and eco-
nomic issues seem to be the most important stumbling blocks.
This section will cover some of the technological develop-
ments in document
issues,
concentrating on formats and qual-
ity aspects.
2.
1. Document formats
2.
1. 1. Structured documents. The project SURFdoc (Hei-
jne et
al.
1995) has shown the importance of document for-
mats when storing electronic documents. Initially, SURFdoc
paid only limited attention to the way documents were to be
stored on the server and to the storage format. For practical
reasons the options for text storage were restricted to ASCII,
wordprocessing formats like WordPerfect (WP) or MS Word,
or PostScript (a page description and printer language devel-
oped by Adobe). These were popular formats and seemed to
justify the expectation that this way few technical problems
were to be anticipated. However, these problems occurred
anyway: ASCII did not allow for integration of text and
graphics; WP and MS Word gave conversion problems; and
PostScript, though offering the capability of text and image
integration, appeared to lack a number of platform viewers to
allow consultation of those documents. In short, all formats
318 The Electronic Library, Vol. 14, No. 4, August 1996

To continue reading

Request your trial

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT