**LOTKAIAN CONCENTRATION THEORY**

Pages | 187-229 |

Published date | 20 January 2005 |

Date | 20 January 2005 |

DOI | https://doi.org/10.1108/S1876-0562(2005)0000005006 |

Author | Leo Egghe |

IV

LOTKAIAN CONCENTRATION THEORY

IV.

1

INTRODUCTION

Concentration theory studies the degree of inequality in a set of positive numbers. It is not

surprising that the historic roots of concentration theory lie in econometrics where one (early

in the twentieth century) felt the need to express degrees of income inequality in a social

group, e.g. a country. Hereby one expresses the "gap" between richness and poverty. One of

the first papers on this topic is Gini (1909) on who's measure we will report later.

The reader of this book will now easily understand that concentration theory takes an

important role in informetrics as well. Indeed, as is clear from Chapter I, mformetrics deals

with the inequality in IPPs, i.e. in production of the sources or, otherwise stated, the inequality

between the number of items per source. As we have seen, Lotkaian informetrics expresses a

large difference between these production numbers. Just to give the most obvious example: if

we have Lotka's law (with exponent a = 2, just to fix the ideas): f(n) = — then

n

f (2) = ——, f (3) = ——, f (4) = —— and so on, where f (n) denotes the number of sources

w 4 V ; 9 w 16 v '

with n items. It is clear that, expressed per production class n, there is a large difference

between the number f (n) of sources in these classes. Zipf s law is also a power law, hence it

also expresses a large difference but now between the numbers g(r), r = 1,2,3,..., where g(r)

denotes the number of items in the source on rank r (where sources are ranked in decreasing

order of their productivity). It is clear that all examples of sources and items, given in Chapter

I, can be the subject of a concentration study. The skewness of these examples was apparent

and hence one should be able to measure it.

188 Power laws in the information production process: Lotkaian informetrics

Generalizing the above examples, we can say that we have a decreasing sequence of positive

numbers x,,x2,...,xN, N e N, and we want to describe the degree of inequality between these

numbers, otherwise stated, the degree of concentration: a high concentration will be where

one has a few very large numbers x,,x2,... and many small numbers ...,xN_,, xN. It is clear

that this must be formalized. We will use techniques developed in econometrics but we will

also report on the "own" developments that have been executed in informetrics

itself.

Under

the "own" developments we can count the so-called 80/20-rule and the law of Price. The main

part of this chapter, however, will be the study of the Lorenz curve which was developed in

econometrics around 1905 (cf. the historic reference Lorenz (1905)).

Let us briefly (and intuitively) describe these concepts here, before studying them more

rigorously in the further sections. The simplest technique is the 80/20-rule which states that

only 20% of the most productive sources produce 80% of all items. Of course, this is just a

simplification of reality: it is the task for informetricians, in each case, to determine the real

share of the items in the most productive sources: 20% of the most productive sources might

produce 65% of all items but this could as well be 83.7%! Also, we do not have to consider

20%

of the most productive sources: any percentage can be considered. So, generalizing, we

can formulate the problem: for any x € ]0,l[ determine 0 e ]0,l[ such that 100x% of the most

productive sources produce 1000% of all items. We can even ask to determine 6 as a

function of x. This "generalized 80/20-rule" could be called the determination of

"normalized" percentiles since both x and 0 belong to the interval [0,l] while in the

calculation of percentiles, one of these numbers is replaced by actual quantities (of items or

sources). Since both x and 0 denote fractions this technique is (sometimes) called an

arithmetic way of calculating concentration (see Egghe and Rousseau (1990a)).

In this sense we can call the law of Price a geometric way of calculating concentration. The

historic formulation (see De Solla Price

(1971,

1976) and implicite in De Solla Price (1963))

i

states that, if there are T sources, the vT =T^ most productive sources produce 50% (i.e. a

fraction —) of all items. For evident reason, this principle is also called Price's square root

law. It is clear how to extend this principle: let 9 6 ]0,l[, then the Te most productive sources

produce a fraction 9 of all sources. This is called Price's law of concentration and we will

Lotkaian concentration theory 189

investigate in what cases in informetrics this is true. Also this principle could be generalized

stating that for 8 E ]0,l[ the top TE sources produce a fraction 9 of all the items and we can

ask for a relation between s and 0.

Both general formulations of the 80/20-rule (in terms of x and 0) and of the law of Price (in

terms of e and 0) involve two numbers. We could wonder if we can construct a function F

such that, for any decreasing vector X = (x,,x2,...,xN), with positive coordinates, the value

F(x) = F(x,,...,xN) is a good measure of the concentration in X. It is clear that an "absolute"

good value for F(X) does not exist but we can determine requirements for the value of F(X) in

comparison with values F(X') for other vextors X' as above, i.e. to give relative value-

judgements. Let us formulate some "natural" requirements.

(i) F(X) should be maximal for the most concentrated situation, namely for a vector

X of the type X = (x,0,.. .,0) where x > 0.

(ii) F(X) should be minimal for the least concentrated situation, namely for a vector X

of the type X = (x,x,.. .,x) where x > 0.

In terms of wealth or poverty, (i) states that X = (x,0,...,0) must have the highest

concentration value (given F), since one source (e.g. person) has everything and the other

sources have nothing. Condition (ii) states that if everybody has the same amount (e.g.

money), the concentration value should be minimal (and preferably zero).

(iii) F(X) should be equal to F(cX) where, for X = (x,,...,xN), the vector cX is

defined as (cx,,...,cxN), forallc>0.

Condition (iii) is called the scale-invariant property and is requested since describing the

concentration of income (i.e. describing wealth and poverty) should be independent on the

used currency (€, $, Yen,...) which all are interrelated via a scale factor. The next property is

also very important:

To continue reading

Request your trial