Guest editorial
Date | 07 August 2017 |
Pages | 622-625 |
Published date | 07 August 2017 |
DOI | https://doi.org/10.1108/EL-08-2017-0187 |
Author | Xinning Su,Chengzhi Zhang,Daqing He |
Subject Matter | Information & knowledge management,Information & communications technology,Internet |
Guest editorial
Guest editorial: managing bigger online data
Introduction
We are living in an era of massive data, whose volume, variety and velocity have not been
seen before. Harvard Business Review reported that about 2.5 exabytes of data were created
every day in 2012, and that this number would double every 40 months (McAfee and
Brynjolfsson, 2012). Internet Live Stats (www.internetlivestats.com) states that the internet
has over 1 billion websites at the time of writing, people send 500 million tweets each day,
and Google receives over 3.5 billion searches in just one day. Furthermore, Facebook has 1.94
billion global monthly active users, and 1.74 billion are mobile users (www.statista.com).
Information about one topic is available in wide range of media and formats. For example,
the AlphaGo’s matches against two top human professional Go players, Lee Sedol and Ke Jie,
in 2016 and 2017, were widely reported, discussed and presented on newspapers, TV
programs, social platforms and various other online sites in many different languages and
media. A complete understanding of the social impact of this event probably needs to
consider information from all these sources.
Data are produced very quickly. For example, Facebook users produce more than 300
terabytes of log data every day and Taobao’s 370 million members generate more than 20
terabytes of transaction data every day (Li and Cheng, 2012). At the same time, people
increasingly demand immediate access to data. Users expect search engines to return
relevant results within few seconds, drivers want live trafc information updated on their
devices while they are driving and tweets are immediately pushed to subscribed followers
for consumption. All of these are still relatively new and often are part of the so-called “big
data” challenges.
Big data imposes challenges for academics as well. Researchers worldwide collect and
generate massive volume of data stored in various forms of databases, and they continually
produce large numbers of scholarly documents – including formal publications such as
articles, books and technical reports, as well as informal documents such as tutorials,
proposals, lab notes or course materials. To demonstrate the scale of academic contributions,
PubMed has over 20 million medical-related articles with 10 million unique names and 70
million name mentions. In addition, an important shift in academics is the change of research
paradigms in wide range of disciplines. Data-intensive and collaborative research has
increasingly become the norm of many disciplines (Hey et al., 2009). In such a paradigm, data
are viewed as a critical component of the “infrastructure of science”, which is important in
forming “the basis for good scientic decisions” (Tenopir et al., 2011).
However, as pointed by Borgman (2015) and many other researchers, the big data
problem in academic disciplines does not always mean that the amount of data has to be at
the petabyte or zetabyte level. There is a long tail distribution of research teams and the data
they work with in their scholar activities. A large number of scholars only work with small
amount of data. However, in this data-intensive research paradigm, they actually are facing
even more big data-like problems. This is because as their research matures, data collection,
analysis methods and storage and preservation facilitates may not be able to cope with the
data that are larger and more diverse than before, as well as the increase in data itself. This
“data exceeding current processing capacities” situation can be more accurately called the
“bigger data” problem.
EL
35,4
622
TheElectronic Library
Vol.35 No. 4, 2017
pp.622-625
©Emerald Publishing Limited
0264-0473
DOI 10.1108/EL-08-2017-0187
To continue reading
Request your trial