Data science from a library and information science perspective

Pages422-441
Published date03 September 2019
DOIhttps://doi.org/10.1108/DTA-05-2019-0076
Date03 September 2019
AuthorSirje Virkus,Emmanouel Garoufallou
Subject MatterLibrary & information science,Librarianship/library management,Library technology,Information behaviour & retrieval,Information & knowledge management,Information & communications technology,Internet
Data science from a library and
information science perspective
Sirje Virkus
School of Digital Technologies, Tallinn University, Tallinn, Estonia, and
Emmanouel Garoufallou
Department of Library Science and Information Systems,
Alexander Technological Educational Institute of Thessaloniki,
Thessaloniki, Greece and
Deltos Group, Thessaloniki, Greece
Abstract
Purpose Data science is a relatively new field which has gained considerable attention in recent years. This
new field requires a wide range of knowledge and skills from different disciplines including mathematics and
statistics, computer science and information science. The purpose of this paper is to present the results of the
study that explored the field of data science from the library and information science (LIS) perspective.
Design/methodology/approach Analysis of research publications on data science was made on the basis
of papers published in the Web of Science database. The following research questions were proposed: What
are the main tendencies in publication years, document types, countries of origin, source titles, authors of
publications, affiliations of the article authors and the most cited articles related to data science inthe field of
LIS? What are the main themes discussed in the publications from the LIS perspective?
Findings The highest contribution to data science comes from thecomputer science research community.
The contributionof information scienceand library science community isquite small. However, there hasbeen
continuous increasein articles from the year 2015. The main documenttypes are journal articles, followed by
conferenceproceedings and editorial material. The top threejournals that publish data sciencepapers from the
LIS perspective are the Journal of the American Medical Informatics Association,the International Journal of
Information Management and the Journal of theAssociation for Information Science and Technology. The top
five countries publishing are USA, China, England, Australia and India. The most cited article has got 112
citations.The analysis revealed that thedata science field is quite interdisciplinary by nature.In addition to the
field of LISthe papers belonged to severalother research areas. The reviewedarticles belonged to thesix broad
categories: data science education and training; knowledge and skills of the data professional; the role of
libraries and librarians in the data science movement; tools, techniques and applicationsof data science; data
science fromthe knowledge management perspective; and data sciencefrom the perspective of healthsciences.
Research limitations/implications The limitations of this research are that this study only analyzed
research papers in the Web of Science database and therefore only covers a certain amount of scientific
papers published in the field of LIS. In addition, only publications with the term data sciencein the topic
area of the Web of Science database were analyzed. Therefore, several relevant studies are not discussed in
this paper that are not reflected in the Web of Science database or were related to other keywords such as
e-science,”“e-research,”“dataservice,”“datacurationor research data management.
Originality/value The field of data science has not been explored using bibliographic analysis of
publications from the perspective of the LIS. This paper helps to better understand the field of data science
and the perspectives for information professionals.
Keywords Data science, Data scientist, Skills, Business value, Data management, Information science,
Library science, Bibliographic analysis, Literature review, IoT
Paper type Research paper
1. Introduction
Data science isa relatively new term which hasgained considerable attention in recent years.
The search of thisphrase provides now more than 76mhits in Google. The data sciencefield
has emergedin response to the increased amount of data. Hugeamounts of data have become
availableto people at all levels of society,through social networks, mobile devices and various
sensor devices(i.e. theInternetofThings). Thesenew types of data, in enormousvolume, in
variousforms, often complex, unstructured and volatile, are beinggenerated at an accelerating
Data Technologies and
Applications
Vol. 53 No. 4, 2019
pp. 422-441
© Emerald PublishingLimited
2514-9288
DOI 10.1108/DTA-05-2019-0076
Received 8 May 2019
Revised 8 June 2019
Accepted 19 June 2019
The current issue and full text archive of this journal is available on Emerald Insight at:
www.emeraldinsight.com/2514-9288.htm
422
DTA
53,4
rate (Virkuset al., 2018). The majority of digital datais generated by consumers, in theform of
movie downloads, VOIP calls, e-mails and cell-phonelocation readings (Regaldo,2013). Cukier
and Mayer-Schönberger (2013), definingthis process as datafication, note thattransforming all
thingsunder the sun into a data formatand thus quantifyingthem is at the heart of the current
world. Justas electricity changed industrial processesand domestic practices in thenineteenth
century, a data-driven paradigm is the core of twenty-first century processes and practices
(Schäferand Van Es, 2017, p. 11). Yet onlyabout 0.5 percent of datais ever analyzed (Regaldo,
2013). There is so much more data out th ere than anyone can capture or analyze and therefore
the concept of data overload has been suggested (Virkus et al., 2018).
At the same time, computers have become much more powerful as technology has
advanced, networking is ubiquitous, and algorithms have been developed that can connect
data sets to enable broader and deeper analyses than previously possible(Provost and
Fawcett, 2013, p. 51). This has led to the emergence of data science (Cervone, 2016). van der
Aalst (2016, p. 4) notes: Data abundance combined with powerful data science techniques
has the potential to dramatically improve our lives by enabling new services and products,
while improving their efficiency and quality.This presents an opportunity for better
decision making and strategy development (Aristodemou and Tietze, 2018, p. 37).
European library and information science (LIS) education has met a number of challenges in
recent years including the financial crisis, negative demographic trends in some countries,
emergingtechnologies, internationalization andglobalization.For this reason, innovative ways
to survive and achieve the educational goals are constantly needed (Virkus, 2015). Data science
is an opportunity that will provide new interdisciplinary perspectives for LIS professionals as
well as for LIS education to address new societal needs including e-science and research data
management. Therefore, there is a growing interest in data management and data science
among library and information professionals (Garoufallou et al., 2008; Antell et al., 2014).
The purpose of this paper is to present the results of the study that explored the field of
data science from LIS perspective on the basis of papers published in the Web of Science
database. Thestructure of this paper is organized as follows: the second section describes the
research methodology adopted. The third section discusses the concepts of data science and
data scientists, the necessary skills required from data scientists and data science-related
activities. In the fourth section, the results of the bibliographic analysis of the data science
from the LIS perspective are presented. In the fifth section, conclusions are presented.
2. Methodology
Analysis of research publications on data science was made on the basis of papers
published in the Web of Science database. Web of ScienceCore Collection provides access
to the worlds leading citation databases and its authoritative, multidisciplinary content
covers over 12,000 of the highest impact journals worldwide, including Open Access
journals and over 150,000 conference proceedings across more than 250 disciplines with
coverage to 1,900 (Virkus, 2016). Therefore, it seemed reasonable to start exploring this
emerging field on the basis of this database.
The following research questions were proposed:
RQ1. What are the main tendencies in publication years, document types, countries of
origin, source titles, authors of publications, affiliations of the article authors and
the most cited articles related to data science in the field of LIS?
RQ2. What are the main themes discussed in the publications from the LIS perspective?
Searches were carried out in the database by topic in April 2019 using the term data
science.The search strategy discovered 80 publications. The following categories were
explored: the years in which the documents were published; the document types of the
423
Data science
from an LIS

To continue reading

Request your trial

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT