# LOTKAIAN CONCENTRATION THEORY

 Pages 187-229 Publication Date 20 Jan 2005 DOI https://doi.org/10.1108/S1876-0562(2005)0000005006 Author Leo Egghe IV
LOTKAIAN CONCENTRATION THEORY
IV.
1
INTRODUCTION
Concentration theory studies the degree of inequality in a set of positive numbers. It is not
surprising that the historic roots of concentration theory lie in econometrics where one (early
in the twentieth century) felt the need to express degrees of income inequality in a social
group, e.g. a country. Hereby one expresses the "gap" between richness and poverty. One of
the first papers on this topic is Gini (1909) on who's measure we will report later.
The reader of this book will now easily understand that concentration theory takes an
important role in informetrics as well. Indeed, as is clear from Chapter I, mformetrics deals
with the inequality in IPPs, i.e. in production of the sources or, otherwise stated, the inequality
between the number of items per source. As we have seen, Lotkaian informetrics expresses a
large difference between these production numbers. Just to give the most obvious example: if
we have Lotka's law (with exponent a = 2, just to fix the ideas): f(n) = then
n
f (2) = ——, f (3) = ——, f (4) = and so on, where f (n) denotes the number of sources
w 4 V ; 9 w 16 v '
with n items. It is clear that, expressed per production class n, there is a large difference
between the number f (n) of sources in these classes. Zipf s law is also a power law, hence it
also expresses a large difference but now between the numbers g(r), r = 1,2,3,..., where g(r)
denotes the number of items in the source on rank r (where sources are ranked in decreasing
order of their productivity). It is clear that all examples of sources and items, given in Chapter
I, can be the subject of a concentration study. The skewness of these examples was apparent
and hence one should be able to measure it. 188 Power laws in the information production process: Lotkaian informetrics
Generalizing the above examples, we can say that we have a decreasing sequence of positive
numbers x,,x2,...,xN, N e N, and we want to describe the degree of inequality between these
numbers, otherwise stated, the degree of concentration: a high concentration will be where
one has a few very large numbers x,,x2,... and many small numbers ...,xN_,, xN. It is clear
that this must be formalized. We will use techniques developed in econometrics but we will
also report on the "own" developments that have been executed in informetrics
itself.
Under
the "own" developments we can count the so-called 80/20-rule and the law of Price. The main
part of this chapter, however, will be the study of the Lorenz curve which was developed in
econometrics around 1905 (cf. the historic reference Lorenz (1905)).
Let us briefly (and intuitively) describe these concepts here, before studying them more
rigorously in the further sections. The simplest technique is the 80/20-rule which states that
only 20% of the most productive sources produce 80% of all items. Of course, this is just a
simplification of reality: it is the task for informetricians, in each case, to determine the real
share of the items in the most productive sources: 20% of the most productive sources might
produce 65% of all items but this could as well be 83.7%! Also, we do not have to consider
20%
of the most productive sources: any percentage can be considered. So, generalizing, we
can formulate the problem: for any x ]0,l[ determine 0 e ]0,l[ such that 100x% of the most
productive sources produce 1000% of all items. We can even ask to determine 6 as a
function of x. This "generalized 80/20-rule" could be called the determination of
"normalized" percentiles since both x and 0 belong to the interval [0,l] while in the
calculation of percentiles, one of these numbers is replaced by actual quantities (of items or
sources). Since both x and 0 denote fractions this technique is (sometimes) called an
arithmetic way of calculating concentration (see Egghe and Rousseau (1990a)).
In this sense we can call the law of Price a geometric way of calculating concentration. The
historic formulation (see De Solla Price
(1971,
1976) and implicite in De Solla Price (1963))
i
states that, if there are T sources, the vT =T^ most productive sources produce 50% (i.e. a
fraction) of all items. For evident reason, this principle is also called Price's square root
law. It is clear how to extend this principle: let 9 6 ]0,l[, then the Te most productive sources
produce a fraction 9 of all sources. This is called Price's law of concentration and we will Lotkaian concentration theory 189
investigate in what cases in informetrics this is true. Also this principle could be generalized
stating that for 8 E ]0,l[ the top TE sources produce a fraction 9 of all the items and we can
ask for a relation between s and 0.
Both general formulations of the 80/20-rule (in terms of x and 0) and of the law of Price (in
terms of e and 0) involve two numbers. We could wonder if we can construct a function F
such that, for any decreasing vector X = (x,,x2,...,xN), with positive coordinates, the value
F(x) = F(x,,...,xN) is a good measure of the concentration in X. It is clear that an "absolute"
good value for F(X) does not exist but we can determine requirements for the value of F(X) in
comparison with values F(X') for other vextors X' as above, i.e. to give relative value-
judgements. Let us formulate some "natural" requirements.
(i) F(X) should be maximal for the most concentrated situation, namely for a vector
X of the type X = (x,0,.. .,0) where x > 0.
(ii) F(X) should be minimal for the least concentrated situation, namely for a vector X
of the type X = (x,x,.. .,x) where x > 0.
In terms of wealth or poverty, (i) states that X = (x,0,...,0) must have the highest
concentration value (given F), since one source (e.g. person) has everything and the other
sources have nothing. Condition (ii) states that if everybody has the same amount (e.g.
money), the concentration value should be minimal (and preferably zero).
(iii) F(X) should be equal to F(cX) where, for X = (x,,...,xN), the vector cX is
defined as (cx,,...,cxN), forallc>0.
Condition (iii) is called the scale-invariant property and is requested since describing the
concentration of income (i.e. describing wealth and poverty) should be independent on the
used currency (€, \$, Yen,...) which all are interrelated via a scale factor. The next property is
also very important: