Setting our bibliographic references free: towards open citation data

Date09 March 2015
Published date09 March 2015
DOIhttps://doi.org/10.1108/JD-12-2013-0166
Pages253-277
AuthorSilvio Peroni,Alexander Dutton,Tanya Gray,David Shotton
Subject MatterLibrary & information science,Records management & preservation,Document management
Setting our bibliographic
references free: towards
open citation data
Silvio Peroni
Department of Computer Science and Engineering,
University of Bologna, Bologna, Italy
Alexander Dutton
IT Services, University of Oxford, Oxford, UK
Tanya Gray
Bodleian Libraries, University of Oxford, Oxford, UK, and
David Shotton
Oxford e-Research Centre, University of Oxford, Oxford, UK
Abstract
Purpose Citation data needsto be recognised as a part of the Commons those works that are freely
and legally available for sharing and placed in an open repository.The paper aims to discuss this issue.
Design/methodology/approach The Open Citation Corpus is a new open repository of scholarly
citation data, made available under a Creative Commons CC0 1.0 public domain dedication and
encoded as Open Linked Data using the SPAR Ontologies.
Findings The Open Citation Corpus presently provides open access (OA) to reference lists from
204,637 articles from the OA Subset of PubMed Central, containing 6,325,178 individual references to
3,373,961 unique papers.
Originality/value Scholars, publishers and institutions may freely build upon, enhance and reuse
the open citation data for any purpose, without restriction under copyright or database law.
Keywords Semantic publishing, Open access, Citations, Open citation corpus, References,
SPAR ontologies
Paper type Viewpoint
1. Introduction
We are living in the early part of the decade of open information. Following a spate of
recent reports and government policy statements (Boulton, 2012; Finch, 2012; American
Journal of Documentation
Vol. 71 No. 2, 2015
pp. 253-277
©Emerald Group Publis hing Limited
0022-0418
DOI 10.1108/JD-12-2013-0166
Received 18 December 2013
Revised26February2014
Accepted 3 March 2014
The current issue and full text archive of this journal is available on Emerald Insight at:
www.emeraldinsight.com/0022-0418.htm
This paper has been developed from the same textual source material from which was distilled a short
Comment piece entitled Open Citationsrecently published by David Shotton in Nature (Shotton,
2013). It thus has substantial textual elements in common with that publication.The authors gratefully
acknowledge the financial support of Jisc, which provided two small grants to David Shotton that, in
addition to enabling the creation of the OCC of which he is the Director, in part also made possible his
development of the SPAR ontologies in collaboration with Silvio Peroni, and of the CiTO Reference
Annotation Tools in collaboration with Tanya Gray. The software development of the first public
prototype of OCC was primarily undertaken by Alexander Dutton during the first Jisc grant. Work
currently in progress on revising the data model, infrastructure and ingest pipeline of the OCC was
initiated during the second Jisc grant, in collaboration with Richard Jones, Mark Macgillivray and
Martyn Whitwell of Cottage Labs, acting as development consultants, who are sincerely thanked for
their excellent work.Silvio Peroni would like to thank Angelo Di Iorio and Andrea Giovanni Nuzzolese,
who co-authored CiTalO, and Paolo Ciancarini and Fabio Vitali for their help and for many fruitful and
proactive discussions about citations, citation functions and citation metrics.
253
Setting our
bibliographic
references free
Meteorological Society, 2013; Burwell et al., 2013; New South Wales Government, 2013;
Research CouncilsUK, 2013; Wellcome Trust, 2013), it can be fairly statedthat the policy
debate on open access (OA) has been won. Interest is now focused on implementation of
the open agenda.
Over the past decade, several studies have demonstrated the importance and
benefits of releasing articles and data as OA material: (Lawrence, 2001; Harnad
and Brody, 2004; Davis et al., 2008; Swan, 2009) gave empirical evidence of the
advantages of OA in terms of better visibility, findability and accessibility for research
articles. Following an initial study showing similar results (Piwowar et al., 2007), a new
larger study by Piwowar and Vision shows that making research data publicly
available can increase the citation rates of articles between 9 and 30 per cent, depending
on the publication dates of the data sets (Piwowar and Vision, 2013).
But what of OA to the citation data, in other words to the reference lists within
scholarly papers that cite other bibliographic resources, from which citation rates can
be calculated? Heather Piwowar, a resident of Vancouver, Canada, never anticipated
the difficulties in collecting such citation data for that study (Piwowar and Vision,
2013). She needed to analyse citation counts for thousands of articles (she had 10K
PubMed IDs to look up), but three of the major sources of citation data, Thomson
ReutersWeb of Science[1], Google Scholar[2] and Microsoft Academic Search[3], did
not support PubMed ID queries. Scopus[4], Elseviers database of scholarly citations,
did, but because Piwowar lacked institutional access to that resource, and with direct
appeals to Scopus staff falling on deaf ears, she had a problem. She eventually obtained
access through a Research Worker agreement with Canadas National Science Library,
but, because she had recently worked in the USA, this required her first to obtain a
police clearance certificate and to have her fingerprints sent to the FBI.
A similar story can be told concerning Steven Greenbergs striking analysis of
citation distortion (Greenberg, 2009), revealing how hypotheses can be converted into
factssimply by repeating citation. His work involved the manual construction and
analysis of a citation network contained 242 papers, 675 citations, and 220,553 distinct
citation paths relevant to a particular hypothesis relating to Alzheimers Disease.
Had those citation data been readily accessible online, they would have been saved
considerable effort. These two examples demonstrate how actual research practice
suffers because access to citation data is currently so difficult.
In this OA decade, we think it is a scandal that reference lists from academic articles,
core elements of scholarly communication that permit the attribution of credit and
integrate our independent research endeavours, are not already freely available for use
by scholars. To rectify this, citation data now needs to be recognised as a part of the
Commons those works that are freely and legally available for sharing and placed
in an open repository, where they should be stored in appropriate machine-readable
formats so as to be easily reused by machines to assist people in producing novel
services. So there is work to be done.
In this paper, we first introduce the issues affecting the currently available sources
of citation data, and then describe our own contributions to this field which attempt to
improve the current situation: the Open Citations Corpus (OCC)[5], the Citations Typing
Ontology (CiTO)[6] (Peroni and Shotton, 2012), the CiTO Reference Annotation Tools[7]
and CiTalO[8]. OCC is an open repository for citations data, available under a Creative
Commons CC0 1.0 public domain dedication and encoded as Open Linked Data. CiTO is
an OWL2 DL ontology (Motik et al., 2012) that enables the assertion of citations in RDF,
and their machine-readable characterisation in terms of the reasons for such citations.
254
JDOC
71,2

To continue reading

Request your trial

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT