Who are the 100 largest scientific publishers by journal count? A webscraping approach
Date | 21 September 2022 |
Pages | 450-463 |
DOI | https://doi.org/10.1108/JD-04-2022-0083 |
Published date | 21 September 2022 |
Subject Matter | Library & information science,Records management & preservation,Document management,Classification & cataloguing,Information behaviour & retrieval,Collection building & management,Scholarly communications/publishing,Information & knowledge management,Information management & governance,Information management,Information & communications technology,Internet |
Author | Andreas Nishikawa-Pacher |
Who are the 100 largest scientific
publishers by journal count?
A webscraping approach
Andreas Nishikawa-Pacher
TU Wien Bibliothek, Vienna, Austria;
Vienna School of International Studies, Vienna, Austria and
Department of Legal and Constitutional History, University of Vienna,
Vienna, Austria
Abstract
Purpose –How to obtain a list of the 100 largest scientific publishers sorted by journal count? Existing
databases are unhelpful as each of them inhere biased omissions and data quality flaws. This paper tries to fill
this gap with an alternative approach.
Design/methodology/approach–The content coverages of Scopus, Publons, DOAJ and SherpaRomeo were
first used to extract a preliminary list of publishers that supposedly possess at least 15 journals. Second, the
publishers’websites were scraped to fetch their portfolios and, thus, their “true”journal counts.
Findings –The outcome is a list of the 100 largest publishers comprising 28.060 scholarly journals, with the
largest publishing 3.763 journals, and the smallest carrying 76 titles. The usual “oligopoly”of major publishing
companies leads the list, but it also contains 17 university presses from the Global South, and, surprisingly,
30 predatory publishers that together publish 4.517 journals.
Research limitations/implications–Additional data sources could be used to mitigate remaining biases; it
is difficult to disambiguate publisher names and their imprints; and the dataset carries a non-uniform
distribution, thus risking the omission of data points in the lower range.
Practical implications –The dataset can serve as a useful basis for comprehensive meta-scientific surveys
on the publisher-level.
Originality/value –The catalogue can be deemed more inclusive and diverse than other ones because many
of the publishers would have been overlooked if one had drawn from merely one or two sources. The list is
freely accessible and invites regular updates. The approach used here (webscraping) has seldomly been used in
meta-scientific surveys.
Keywords Bibliographic systems, Data collection, University presses, Journals, Online databases, Journal
publishers, Predatory publishers
Paper type Research paper
JD
78,7
450
© Andreas Nishikawa-Pacher. Published by Emerald Publishing Limited. This article is published
under the Creative Commons Attribution (CC BY 4.0) licence. Anyone may reproduce, distribute,
translate and create derivative works of this article (for both commercial and non-commercial purposes),
subject to full attribution to the original publication and authors. The full terms of this licence may be
seen at http://creativecommons.org/licences/by/4.0/legalcode
Corrigendum: It has come to the attention of the publisher that the article: Nishikawa-Pacher, A.
(2022), “Who are the 100 largest scientific publishers by journal count? A webscraping approach”,
Journal of Documentation, Vol. 78 No. 7, pp. 450-463. https://doi.org/10.1108/JD-04-2022-0083 mistakenly
labelled IOS Press as a predatory publisher in Table 2. Amendments have been made to Table 2 and
throughout the text to correct this issue. The authors sincerely apologise to IOS Press and the readers for
any inconvenience caused.
A preprint version of this paper appeared as “Who are the 100 Largest Scientific Publishers by
Journal Count? A Webscraping Approach”and has been posted on the SocArXiv repository.
Funding: The author acknowledges TU Wien Bibliothek for financial support through its Open
Access Funding Programme.
The current issue and full text archive of this journal is available on Emerald Insight at:
https://www.emerald.com/insight/0022-0418.htm
Received 15 April 2022
Revised 29 July 2022
13 August 2022
Accepted 21 August 2022
Journal of Documentation
Vol. 78 No. 7, 2022
pp. 450-463
Emerald Publishing Limited
0022-0418
DOI 10.1108/JD-04-2022-0083
To continue reading
Request your trial