A STRAIGHTFORWARD METHOD FOR ADVANCE ESTIMATION OF USER CHARGES FOR INFORMATION IN NUMERIC DATABASES

Date01 February 1986
Pages65-83
Published date01 February 1986
DOIhttps://doi.org/10.1108/eb026787
AuthorKALERVO JÄRVELIN
Subject MatterInformation & knowledge management,Library & information science
THE
Journal of Documentation
VOLUME 42 NUMBER 2 JUNE 1986
A STRAIGHTFORWARD METHOD FOR ADVANCE
ESTIMATION OF USER CHARGES FOR INFORMATION IN
NUMERIC DATABASES
KALERVO JÄRVELIN
Department
of
Library and Information
Science,
University
of
Tampere,
PO Box 607, SF-33101
Tampere,
Finland
It is generally recognised that numeric databases (NDBs) have become essential in
information retrieval (IR). NDBs differ from traditional bibliographic databases
(BDBs) with respect to their content, structural complexity, data manipulation
capabilities, and the complexity of the user interfaces and user charging schemes.
Recent trends in user interfaces and user charging for all online IR are towards
charging for the information actually retrieved from the database rather than for
the connect time. However, the viability of such charging schemes depends on the
user's ability to estimate the charges in advance, during the query negotiation
phase. This paper presents a systematic and general method for estimating user
charges for retrieved information in advance, in the context of NDBs based on the
relational data model (RDM). The method accepts relational algebra (RA) queries
of any complexity, estimates the sizes of their results, and charges for them on the
basis of the descriptions of the original database files. The method is a novel one
and
is
directly applicable to any RDM-based NDB. Tools based on the method are
required in the query interfaces to NDBs in order to make query formulation and
reformulation meaningful.
1.
INTRODUCTION
NUMERIC ONLINE DATABASES (NDBs) are rapidly becoming more and
more popular: in 1982 about one half of the thousand databases covered by a
survey1 were numeric. Today NDBs provide online data on many aspects of
business and trade, social science and governmental activities, science and tech-
nology. The trends in their accessibility, charging, coverage, timeliness and ease
of
use
increase their popularity with respect to their print-on-paper counterparts.
Their use is growing faster than that of bibliographic databases (BDBs), and
they earn more revenue than BDBs. They are essential in modern information
retrieval.2-5
NDBs differ in many respects from traditional BDBs. Firstly, they contain in-
Journal
of
Documentation,
Vol. 42, No.
2,
June 1986, pp.
65-83.
65
JOURNAL
OF
DOCUMENTATION Vol.42, no.2
formation (as opposed to references) that can be used directly to support various
real-life activities. Secondly, the structure of NDBs is much more complex than
that of BDBs. Typical information retrieval in NDBs requires combining data
from several files. Thirdly, while the manipulation of data is essential in NDBs,
plain retrieval characterises BDBs. This means, fourthly, that the user interfaces
to NDBs, and especially their query and data manipulation languages, are much
more complex and powerful than those of
BDBs.
As a result, the user charging
schemes in NDBs are also more complex than in BDBs.3 The charging elements
often include connect time, computer processing power, storage use, graphics
services use, offline printing, administrative overheads, subscription and tele-
communication.
Recent trends in the discussion of online database charging methods emphasise
the importance of charging users for what they actually retrieve from the
database, and not on the basis of connect time,
as
is usual today.6-10 Because infor-
mation is what is being sold, that is what should be charged for. Alternatively, or
complementarily, the value added by the online service and/or the query process-
ing cost, should also be charged for,3,7,8 but not connect hours, because that is
counterproductive. However, if such charging schemes are to be used, the
database users must be able to assess the query costs during the query negotiation
phase.7,9 Users are cost sensitive and do not want to commit themselves to paying
an unspecified amount of money for an unknown amount of data. Failures in
estimating the costs in advance generally lead to complaints about the systems.11
Therefore the query interfaces to databases must be equipped with tools that allow
cost estimation by the users. This is the responsibility of the information science
and service communities.
This paper will focus on the advance estimation of user charges for queries in
NDBs when the charges are based solely on the data retrieved. The task is to
estimate the price of the result at the query negotiation phase for various query
formulations. For brevity, this topic is termed user charge estimation. It is an
essential prerequisite for productive, query result-based charging schemes. Ex-
perienced searchers can often estimate the query costs in BDBs, and elapsed
connect-time during query negotiation is also recognised. However, in NDBs
this
is
much more difficult: it is difficult to estimate in advance the query cardinali-
ty, i.e. the number of records it retrieves,12 and their retrieval cost.3,7 This stems
from the complexities of the databases, their file structures and query languages.
Therefore user charge estimation in NDBs is an important research problem in its
own right.
There are no previous studies on user charge estimation in NDBs, at least as far
as is known. However, the topic is not totally new in the literature. One prere-
quisite of user charge estimation is the ability to estimate the query cardinalities.
This has been studied and reported extensively in the literature on database design
and query optimisation in the context of business databases. 13-15 Another prere-
quisite is the derivation of the per-output record charges for queries. Some BDBs
have been charging differently for different output formats thereby reacting to the
varying value the users receive. However, this convention is not viable in NDBs
because the output records may
be
constructed from the stored database records in
very complex ways. Moreover, all the data in NDB outputs are not equally
valuable to the users. Typically some data items are 'hot' while others are not.
This may be due to their intrinsic value to the user or due to their usability in com-
bining data from several sources. In addition, some files are typically 'hot' and
66

To continue reading

Request your trial

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT