LOTKAIAN INFORMETRICS : AN INTRODUCTION

Pages7-99
Date20 January 2005
DOIhttps://doi.org/10.1108/S1876-0562(2005)0000005003
Published date20 January 2005
AuthorLeo Egghe
LOTKAIAN INFORMETRICS : AN
INTRODUCTION
I.I INFORMETRICS
The concept of informetrics is well-known nowadays and could be defined as the science
dealing with the quantitative aspects of information. This is the widest possible definition
comprising mathematical and statistical treatment of diverse forms of information: books,
articles, journals, references and citations, libraries and other information centers, research
output, collaboration and transport (e.g. in networks such as intranets or the internet).
Although there might be different opinions on this, informetrics could be considered (as we
will do here) as comprising other disciplines such as bibliometrics or scientometrics.
Bibliometrics can be defined as the quantitative study of pieces of literature as they are
reflected in bibliographies (White and McCain (1989)) or by the well-known definition of
Pritchard (1969): the application of mathematical and statistical methods to books and other
media of communication (see also Narin and Moll (1977) and Tague-Sutcliffe (1994)).
Scientometrics, coined in Nalimov and Mul'cenko (1969) as "naukometrija", deals with
quantitative aspects of science, including research evaluation and science policy.
The term informetrics, we believe introduced in 1979 by Blackert and Siegel (1979) and by
Nacke (1979), gained popularity by the organization of the international informetrics
conferences in 1987 (see Egghe and Rousseau (1988, 1990b)) and by the foundation (during
the fourth international informetrics conference in 1993) of the ISSI, the International Society
for Scientometrics and Informetrics, hereby also recognizing the importance of the term
scientometrics, mainly because of the existence of the important journal with the same name.
It is not the purpose to repeat in this book all historic facts on the science informetrics since
this has been covered many times in an excellent way in publications as White and McCain
I
8 Power laws in the information production process: Lotkaian informetrics
(1989),
Ikpaahindi (1985), Lawani (1981), Tague-Sutcliffe (1994), Brookes (1990) and the
more recent, very comprehensive (almost encyclopaedic) Wilson (1999).
Since, however, we intend to provide mathematical foundations for a part of informetrics
(called Lotkaian informetrics, of course explained further on) we will provide in this overview
a concrete description of the concept of generalized bibliography (or information production
process) as well as the standard definitions of informetric functions, both as a concept and as
they appear in the literature (then called laws). They form the basis of the informetric theory
and will be formalized in Chapter II.
The main object of study in informetrics is the generalized bibliography, also called (e.g. in
Egghe (1990a)) an "information production process" (IPP). The most classical example is,
indeed, a bibliography (on a certain subject) where one has a collection of articles dealing
with this subject. Of course, articles are published in journals and this is the basic aspect of
IPPs:
in the example of a classical bibliography, journals can be considered as sources that
"produce" items, i.e. the articles collected in the bibliography. The point is that, in
informetrics, one can provide several other examples of sources, containing items. Indeed one
can consider the publications (articles) of an author also as a source (author)-item
(publication) relationship: an author "produces" a publication. Even an article (being an item
in the previous examples) can become a source e.g. "producing" references or citations as
items.
In a library, books (as sources) are the "producers" of loans (each time a book is
borrowed this is an item belonging to the source, being the book itself). Although this
example might seem more abstract than the previous ones it is a very natural example and of
the same nature as (although completely different from) the following example. In
quantitative linguistics (for a basic reference, see Herdan (I960)) one considers texts as (in
our terminology) IPPs where words are considered as sources and their use in the text (i.e.
each time a word appears in the text) is considered as an item. There one uses the terminology
"Type/Token"-relationship for what we call here the "Source/Item"-relationship (see also
Chen and Leimkuhler (1989)). In this book we will use both versions of the terminology:
mainly source/item, since this is classical in informetrics; type/token being interesting
descriptions of the same phenomenon and sometimes used, e.g. where it is more convenient
such as in Chapter III. We are convinced this will not confuse the reader (again: there is no
difference: source/item = type/token) and in this way we underline the fact that our
Lotkaian informetrics: an introduction 9
framework is applicable outside informetrics (although linguistics is a neighbor discipline of
information science!).
Really outside informetrics we are still able to find the same framework. We can give the
example in econometrics of workers or employees (as sources) in relation with their
productivity (as items) (Theil (1967)). Productivity can be expressed by the number of
produced objects by these sources or in terms of profits (amount of money earned by these
sources). Even in demography one can consider cities and villages (as sources) in relation to
their populations (each member being an item).
So,
in general, we can define an IPP as a triple (S,I,F) where S is the set of sources, I the set
of items and F a function indicating which item iel belongs to which source seS. In this
sense we can talk about two-dimensional informetrics studying sources/items and their
interrelation by means of F. Two-dimensional informetrics is hence more (higher) than two
times a one-dimensional informetrics theory developed separately on the sources (e.g. number
of sources) and on the items (e.g. number of items). The function F can be considered as a
relation F: S
>
I or as a function F: S
>
21 (the set of all subsets of I) where, for each s e S,
F(s)d, the subset of I containing the items that are produced by (that belong to) source s.
The classical way of thinking, as we will do in this first chapter, is limiting the sets S and I to
finite discrete sets (e.g. finite subsets of N , the set of natural numbers, so that we can count
them).
For the basic theory, however, in Chapter II, we will express the fact that IPPs usually
contain many sources and items, by using continuous sets for S and I (such as intervals
[a,b] c R+, the positive real numbers).
The reader might remark that this common framework is nice but does it lead us somewhere,
i.e. is there a reason to formulate these objects in a common way? The answer is yes, for
several reasons. First of all there is the challenge in itself to detect and define common tools
or frameworks (such as IPPs) among these different sciences. Once this is done we can then
elaborate these tools in a unified way e.g. by defining common measures and functions (such
as distribution functions - see further, but these are not the only examples). An intriguing
benefit of this approach would be if we detect or prove that some of these measures or
functions have the same form in the different "-metrics" sciences! While leaving this

To continue reading

Request your trial

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT