A systematic literature review on Wikidata

Date01 July 2019
DOIhttps://doi.org/10.1108/DTA-12-2018-0110
Pages250-268
Published date01 July 2019
AuthorMarçal Mora-Cantallops,Salvador Sánchez-Alonso,Elena García-Barriocanal
Subject MatterLibrary & information science,Librarianship/library management,Library technology,Information behaviour & retrieval,Information & knowledge management,Information & communications technology,Internet
A systematic literature
review on Wikidata
Marçal Mora-Cantallops, Salvador Sánchez-Alonso and
Elena García-Barriocanal
Universidad de Alcalá de Henares, Alcalá de Henares, Spain
Abstract
Purpose The purpose of this paper is to review the current status of research on Wikidata and, in
particular, of articles that either describe applications of Wikidata or provide empirical evidence, in order to
uncover the topics of interest, the fields that are benefiting from its applications and which researchers and
institutions are leading the work.
Design/methodology/approach A systematic literature review is conducted to identify and review how
Wikidata is being dealt with in academic research articles and the applications that are proposed. A rigorous
and systematic process is implemented, aiming not only to summarize existing studies and research on the
topic, but also to include an element of analytical criticism and a perspective on gaps and future research.
Findings Despite Wikidatas potential and the notable rise in research activity, the field is still in the early
stages of study. Most research is published in conferences, highlighting such immaturity, and provides little
empirical evidence of real use cases. Only a few disciplines currently benefit from Wikidatas applications and
do so with a significant gap between research and practice. Studies are dominated by European researchers,
mirroring Wikidatas content distribution and limiting its Worldwide applications.
Originality/value The results collect and summarize existing Wikidata research articles published in the
major international journals and conferences, delivering a meticulous summary of all the available empirical
research on the topic which is representative of the state of the art at this time, complemented by a discussion
of identified gaps and future work.
Keywords Literature review, Survey, Applications, Empirical studies, Wikidata, Knowledge graphs
Paper type Literature review
1. Introduction
Wikidata is an open, collaborative project started on October 30, 2012 by Wikimedia
Deutschland, hosted and supported by the Wikimedia Foundation (Abián et al., 2018),
continuously increasing its popularity since its creation (Vrandečić, 2013; Vrandečićand
Krötzsch, 2014) and whose main goals are two: to be the central storage for the structured
data of all its Wikimedia sister projects (such as Wikipedia itself ), avoiding duplicate
and contradicting information, but also facilitating multi-language capabilities and
management; and to provide data to other third-party projects and initiatives, and
allowing complex queries on the existing base of knowledge. Wikidata does not only
store facts, but also the corresponding reference sources, allowing data validation and the
creation of timelines (e.g. a countrys population is a variable that can be referenced to the
census and changes across time). Labels, aliases and descriptions of entities in Wikidata
are provided in more than 350 languages. The basic structure of Wikidata consists of
items (that have a label, a description and any number of aliases, known as terms),
properties and values, linked in statements that closely resemble an RDF triple. However,
the model of Wikidata statements is slightly more complex, as they can be enriched
with qualifiers (providing additional context for the claim) and references (which support
the claim). As of October 2018, Wikidata has more than 60,000 registered authors
(contributors with ten or more edits) who, together with anonymous users and automatic
bots, have contributed to more than 53.5m data items (https://stats.wikimedia.org/
wikispecial/EN/TablesWikipediaWIKIDATA.htm). Furthermore, the number of authors
and articles has been steadily increasing since its conception (https://stats.wikimedia.org/
wikispecial/EN/ChartsWikipediaWIKIDATA.htm).
Data Technologies and
Applications
Vol. 53 No. 3, 2019
pp. 250-268
© Emerald PublishingLimited
2514-9288
DOI 10.1108/DTA-12-2018-0110
Received 31 December 2018
Revised 23 March 2019
Accepted 9 May 2019
The current issue and full text archive of this journal is available on Emerald Insight at:
www.emeraldinsight.com/2514-9288.htm
250
DTA
53,3
Erxleben et al. (2014, p. 51) point out that the relevance of Wikidata for researchers in
semantic technologies, linked open data,and Web science [] hardly needs to be argued for.
In 2014, however, they found that Wikidata had been hardly used in the Semantic Web
community, even though the relative success of projects such as DBpedia (Bizer et al., 2009)
and Freebase (Bollackeret al., 2008) hinted at the potential ofWikidata. The situation notably
changed in the last few years, as Freebase was shutdown in 2015 and integrated into
Wikidata (Pellissier Tanon et al., 2016), while in 2017 Wikidata was already found to be the
most suitable source of information for person data (twice as many instances as DBpedia) or
detailed information about countries, among others (Ringler and Paulheim, 2017). The
question is [] has Wikidata become relevant to researchers and practitioners too?
The purpose of this study is, thus, to review the current status of research on Wikidata and,
in particular, we concentrate on articles that either describe applications of Wikidata or study
the project empirically, to uncover the topics of interest, to assess its related research activity
and to identifywhat researchers and institutions areleading the work, as detailedin Section 2.
Our methodology is described in Section3 and our results are presented in Section4. In Section
5, all four research questions are discussed. Conclusions are finally presented in Section 6.
2. Research questions
The research questions addressed by this study are:
RQ1. How much research activity has there been since the introduction of Wikidata?
RQ2. What are the main topics covered by empirical studies on Wikidata?
RQ3. What Wikidata applications are proposed in the literature?
RQ4. Who is leading Wikidata research?
With respectto RQ1, we identified how many relevant paperswere published per year as well
as the journal or conferencethat published them. To answer RQ2, we considered the scope of
the study (whetherit is based on empirical evidence or proposesan application) and the topics
or disciplines involved. In particular, applications will be reviewed in more detail in RQ3,as
one of the main goalsof Wikidata is to support third-party projects andinitiatives and, thus, it
becomes relevantto survey the current range of existing applications.Finally, with respect to
RQ4, we considered individual researchers and their affiliations.
3. Methodology
In order to identify and review how Wikidata is addressed in academic research articles and
what applications are proposed, a systematic literature review was conducted. We used a
rigorous and systematic process, aiming not only to summarize existing studies and
research on the topic, but also to include an element of analytical criticism and a perspective
on gaps and future research (Okoli, 2015). Systematic reviews employ carefully defined
protocols to determine which studies are to be included, as well as to analyze their
contribution in as unbiased a form as possible (Kitchenham, 2004; Webster and Watson,
2002). This study has been undertaken as a systematic literature review based on the
original guidelines as proposed by Kitchenham (2004) combined with the guidelines in
software engineeringby Budgen and Brereton (2006).
3.1 Search strategy
Four online academic research databases (ACM Digital Library, IEEE Xplore, Springer Link
and Science Direct) were scanned for relevant articles, complemented with a search in ISI
Web of Science and Google Scholar to add any articles that had not been found in the
previous four databases. ACM and IEEE were considered relevant due to their focus on
251
A systematic
literature
review on
Wikidata

To continue reading

Request your trial

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT