Clusters and maps of science journals based on bi‐connected graphs in Journal Citation Reports

Date01 August 2004
Published date01 August 2004
Pages371-427
DOIhttps://doi.org/10.1108/00220410410548144
AuthorLoet Leydesdorff
Subject MatterInformation & knowledge management,Library & information science
Clusters and maps of science
journals based on bi-connected
graphs in Journal Citation
Reports
Loet Leydesdorff
Science and Technology Dynamics, University of Amsterdam, Amsterdam
School of Communications Research (ASCoR), Amsterdam, The Netherlands
Keywords Journals, Statistical analysis, Maps, Cluster analysis
Abstract The aggregated journal-journal citation matrix derived from Journal Citation Reports
2001 can be decomposed into a unique subject classification using the graph-analytical algorithm of
bi-connected components. This technique was recently incorporated in software tools for social
network analysis. The matrix can be assessed in terms of its decomposability using articulation
points which indicate overlap between the components. The articulation points of this set did not
exhibit a next-order network of “general science” journals. However, the clusters differ in size and
in terms of the internal density of their relations. A full classification of the journals is provided in
the Appendix. The clusters can also be extracted and mapped for the visualization.
1. Introduction
In an article in Science entitled “Networks of scientific papers,” Derek de Solla Price
(1965) reported that he had been able to work with the experimental version of the
Science Citation Index for 1961. In this paper, he noted the possibility of studying the
dynamics of journal relations by using the aggregates of their mutual citations.
However, it would take until 1975 before the Institute of Scientific Informatio n began to
compile this data systematically on a yearly basis (Garfield, 1979).
Now, the Journal Citation Reports of the Science Citation Index covers some
5,500-plus journals. The decomposition of this data into journal groupings is urgen t for
reasons of improving both the consistency in journal collections and the baselines in
scientometric evaluations (Studer and Chubin, 1980; Moed et al., 1985; Gla
¨nzel and
Schubert, 2003). Journal sets can be considered as indicators of the intellectual
organisation of the sciences at the aggregated level (Leydesdorff, 1987). Furthermore,
they are reproduced from year to year with considerable stability, and hence they can
also be used as indicators of evolutionary change (Leydesdorff, 2002a). However, the
problem of how to delineate these sets consistently and over time has been a major
concern of the scientometric analysis since its early days (Narin et al., 1972; Carpe nter
and Narin, 1973; Narin, 1976; Leydesdorff, 2002b).
The data can be considered as a huge matrix of the aggregated citations among the
journals. Each cell ij of the matrix indicates how often journal icites journal jduring a
given year. Such a large matrix, however, could hitherto not be analysed easily with
software because of computational limitations. It had to be broken down into chunks,
or specific algorithms had to be used in order to address local densities contained in the
matrix. The choice of one algorithm over another, or delineation in terms of one subset
The Emerald Research Register for this journal is available at The current issue and full text archive of this journal is available at
www.emeraldinsight.com/researchregister www.emeraldinsight.com/0022-0418.htm
Clusters and
maps of journals
371
Received 10 December 2003
Revised 2 February 2004
Accepted 6 February 2004
Journal of Documentation
Vol. 60 No. 4, 2004
pp. 371-427
qEmerald Group Publishing Limited
0022-0418
DOI 10.1108/00220410410548144
or another, however, implies a selection, and therefore various representations of this
data could be entertained. A unique solution seemed impossible given the limitations in
hardware and software.
In an early stage, the ISI developed indices like the impact factor of journals
(Garfield, 1972; Garfield and Sher, 1963) and the immediacy index (Price, 1970; Cozzens,
1985; Moed, 1989). The question of clustering the data was raised in the 1970s by Narin
et al. (1972) and by Carpenter and Narin (1973). These authors, however, focused on
mapping the hierarchy among the journals (Narin, 1976). A better understanding of
stratification in the journal set would enable libraries to rationalise their portfolios
given budget constraints (e.g. Hirst, 1978), and it might also provide us with a baseline
for comparisons in research evaluations (Studer and Chubin, 1980; Moed et al., 1985).
Given the computational limitations at that time, the approaches of the 1970s were
based either on using aggregate measures like “total cited” or “total citing” (e.g. divided
by the total number of publications in order to compute the citation/publication-ratios
of groups of authors or the impact factors of journals) or algorithms that were
essentially based on “single-linkage clustering.” Single-linkage clustering can be ba sed
on sorting the links (that is, cell values) into a list. A list can be indexed by decreasing
frequencies using software for database management. Thus, the two-dimensional
problem of decomposition of the matrix could be reduced to the one-dimensional
problem of handling lists.
For example, co-citation analysis was developed within ISI to map the sciences
using single-linkage clustering (Small and Griffith, 1974). Mapping in more than a
single dimension, however, requires a two-dimensional visualisation technique (Small
and Sweeney, 1985; Small et al., 1985; cf. Leydesdorff, 1987; Small, 1999; Chen, 2003). A
multi-dimensional approach using the relations among the journals as a network of
links and the cited journals as the nodes became more widespread during the 1980s.
Doreian and Fararo (1985) used graph analytical techniques for block-modelling the
relations, but the focus of these studies was still heavily on discovering the relative
standing of journals within the hierarchy (Doreian, 1986).
Tijssen et al. (1987) used quasi-correspondence analysis to map the journals as
groups on the basis of their deviation from normalised expectations. Clusters of
journals could then be made visible. However, the delineations had to be pencilled into
the pictures and remained based on intuitive or expert-based classification. One
advantage of this method was that it allowed for the visualisation of both the cited and
the citing patterns in a single mapping. Leydesdorff (1986, 1987) used factor analysis
on selective parts of the aggregated journal-journal citation matrix. The factor analysis
provides us with clear delineations, but this technique remains limited in terms of the
number of variables that can be rotated in each run.
These various authors agreed in using the information contained in the matrix of
aggregated journal-journal relations, but they did not pursue the analysis with similar
interests. The data contain two types of structures (Burt, 1982), namely:
(1) A hierarchical structure that can be studied at the level of the database (e.g. in
terms of impact factors) or at the level of specific groups of journals (e.g. The
Lancet and The New England Journal of Medicine for the medical sciences, the
Journal of the American Chemical Society within chemistry).
(2) A grouping in terms of disciplines and specialties.
JDOC
60,4
372
The different groupings cannot be expected to entertain hierarchical relations across
groups, but certain journals may relate to other groups as a layer of next-order (i.e.
multi-disciplinary) journals (e.g. Science,Nature,PNAS USA; cf. Gla
¨nzel and Schubert,
2003).
The standing of a journal in the hierarchy is a property of the journal in relation to
its competitors, but the grouping is a function at the network level. The facto r (or
cluster) analysis enables us to identify eigenvectors of the network that can be
considered as orthogonal dimensions. These dimensions are independent. Ranking and
grouping are thus two very different operations. The groupings remain heavily
dependent on the specific delineations and sensitive to changes in the journal citation
structures over time (Leydesdorff and Cozzens, 1993). Thus, they cannot easily be
reproduced for different years: the ranks can be expected to show more stability at the
top of the hierarchy, for example, in time-series and trend analysis.
Recent developments in social network analysis and the availability of larger
capacities in terms of computer memory management enable us to make further
progress in handling the data contained in the Journal Citation Reports both
hierarchically and in terms of their structure. First, the entire network can be read into
a computer program for the statistical analysis or the visualisation because the various
programs in the Windows environment can nowadays use all internal memory
available (Leydesdorff, 2004; Leydesdorff and Jin, in preparation). Second, the
algorithm for bi-connected components was recently incorporated in software tools for
social network analysis (Moody and White, 2003). This technique was develo ped in
order to find robust clusters in large data sets (Knaster and Kuratowski, 1921).
2. Bi-connected component analysis
A bi-connected component is a maximally connected subgraph in which for every
triple of vertices a,v, and wthere exists a chain between vand wwhich does not include
the vertex a. In other words, each node in the bi-connected component is linked to at
least two other nodes in this cluster. Therefore, the network remains connected after
removing any vertex (Mrvar and Batagelj, n.d.). This bi-connectedness stabilises the
cluster against changes in the initial selection when producing the database. Thus, the
inclusion or exclusion of journals by ISI would not directly affect the large
bi-components contained in the network data.
Between two bi-connected components of a network, there can be an overlap. A
vertex in the overlap is called an articulation point or cut-point (see Figure 1). A no de in
a graph is an articulation point if removal of this node breaks the graph into more than
one bi-connected component (Scott, 1991)[1]. Articulation points belong to more than a
single bi-connected component and can therefore be considered as a next order in the
Figure 1.
Two bi-connected network
components with an
articulation point
Clusters and
maps of journals
373

To continue reading

Request your trial

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT