Who are the 100 largest scientific publishers by journal count? A webscraping approach

Date21 September 2022
Pages450-463
DOIhttps://doi.org/10.1108/JD-04-2022-0083
Published date21 September 2022
Subject MatterLibrary & information science,Records management & preservation,Document management,Classification & cataloguing,Information behaviour & retrieval,Collection building & management,Scholarly communications/publishing,Information & knowledge management,Information management & governance,Information management,Information & communications technology,Internet
AuthorAndreas Nishikawa-Pacher
Who are the 100 largest scientific
publishers by journal count?
A webscraping approach
Andreas Nishikawa-Pacher
TU Wien Bibliothek, Vienna, Austria;
Vienna School of International Studies, Vienna, Austria and
Department of Legal and Constitutional History, University of Vienna,
Vienna, Austria
Abstract
Purpose How to obtain a list of the 100 largest scientific publishers sorted by journal count? Existing
databases are unhelpful as each of them inhere biased omissions and data quality flaws. This paper tries to fill
this gap with an alternative approach.
Design/methodology/approachThe content coverages of Scopus, Publons, DOAJ and SherpaRomeo were
first used to extract a preliminary list of publishers that supposedly possess at least 15 journals. Second, the
publisherswebsites were scraped to fetch their portfolios and, thus, their truejournal counts.
Findings The outcome is a list of the 100 largest publishers comprising 28.060 scholarly journals, with the
largest publishing 3.763 journals, and the smallest carrying 76 titles. The usual oligopolyof major publishing
companies leads the list, but it also contains 17 university presses from the Global South, and, surprisingly,
30 predatory publishers that together publish 4.517 journals.
Research limitations/implicationsAdditional data sources could be used to mitigate remaining biases; it
is difficult to disambiguate publisher names and their imprints; and the dataset carries a non-uniform
distribution, thus risking the omission of data points in the lower range.
Practical implications The dataset can serve as a useful basis for comprehensive meta-scientific surveys
on the publisher-level.
Originality/value The catalogue can be deemed more inclusive and diverse than other ones because many
of the publishers would have been overlooked if one had drawn from merely one or two sources. The list is
freely accessible and invites regular updates. The approach used here (webscraping) has seldomly been used in
meta-scientific surveys.
Keywords Bibliographic systems, Data collection, University presses, Journals, Online databases, Journal
publishers, Predatory publishers
Paper type Research paper
JD
78,7
450
© Andreas Nishikawa-Pacher. Published by Emerald Publishing Limited. This article is published
under the Creative Commons Attribution (CC BY 4.0) licence. Anyone may reproduce, distribute,
translate and create derivative works of this article (for both commercial and non-commercial purposes),
subject to full attribution to the original publication and authors. The full terms of this licence may be
seen at http://creativecommons.org/licences/by/4.0/legalcode
Corrigendum: It has come to the attention of the publisher that the article: Nishikawa-Pacher, A.
(2022), Who are the 100 largest scientific publishers by journal count? A webscraping approach,
Journal of Documentation, Vol. 78 No. 7, pp. 450-463. https://doi.org/10.1108/JD-04-2022-0083 mistakenly
labelled IOS Press as a predatory publisher in Table 2. Amendments have been made to Table 2 and
throughout the text to correct this issue. The authors sincerely apologise to IOS Press and the readers for
any inconvenience caused.
A preprint version of this paper appeared as Who are the 100 Largest Scientific Publishers by
Journal Count? A Webscraping Approachand has been posted on the SocArXiv repository.
Funding: The author acknowledges TU Wien Bibliothek for financial support through its Open
Access Funding Programme.
The current issue and full text archive of this journal is available on Emerald Insight at:
https://www.emerald.com/insight/0022-0418.htm
Received 15 April 2022
Revised 29 July 2022
13 August 2022
Accepted 21 August 2022
Journal of Documentation
Vol. 78 No. 7, 2022
pp. 450-463
Emerald Publishing Limited
0022-0418
DOI 10.1108/JD-04-2022-0083

To continue reading

Request your trial

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT