THE STRUCTURE OF THE EURATOM‐THESAURUS

Published date01 April 1971
Pages267-272
DOIhttps://doi.org/10.1108/eb026521
Date01 April 1971
AuthorEUGENIU TOMA
Subject MatterInformation & knowledge management,Library & information science
THE STRUCTURE OF THE EURATOM-THESAURUS
EUGENIU TOMA
Scientific Documentation Centre of the Academy of
Sciences,
Bucharest
An analysis of the rank-frequency distribution of the EURATOM-thesaurus
was carried out. Zipf's law
(a
hyperbolic function) was not found to be suit-
able
for this distribution, and an exponential law was used. The total entropy
of the thesaurus calculated by means of this exponential function was found
in good agreement with the actual entropy of the
thesaurus.
The exponential
function may provide
a
criterion to 'revise' some zones of thesauri.
TO CARRY OUT statistical investigations on the informational value of
formalized languages the EURATOM-thesaurus (ET) has shown itself to
be very representative. This thesaurus is based on over half a million docu-
ments, its first edition consisting of 1,205 terms describing concepts, the
frequencies of which are known.1 On the basis of Zipf's law' one would
expect a straight line for the plot of log f against log r, where/is the fre-
quency of the term ranked r. In fact the plot of the term frequency f of the
ET as a function of the rank r on log paper does not give a straight line as
would be obtained for natural language,4 but rather a part of an ellipse
(Fig. 1).
It
is
to be noted that thesauri and controlled vocabularies tend to achieve
267

To continue reading

Request your trial

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT