Long-term preservation of big data: prospects of current storage technologies in digital libraries
Pages | 539-555 |
DOI | https://doi.org/10.1108/LHT-06-2017-0117 |
Date | 17 September 2018 |
Published date | 17 September 2018 |
Author | Wasim Ahmad Bhat |
Long-term preservation of big
data: prospects of current storage
technologies in digital libraries
Wasim Ahmad Bhat
Department of Computer Sciences, University of Kashmir, Srinagar, India
Abstract
Purpose –The purpose of this paper is to investigate the prospects of current storage technologies for
long-term preservation of big data in digital libraries.
Design/methodology/approach –The study employs a systematic and critical review of the relevant
literature to explore the prospects of current storage technologies for long-term preservation of big data in
digital libraries. Online computer databases were searched to identify the relevant literature published
between 2000 and 2016. A specific inclusion and exclusion criterion was formulated and applied in two
distinct rounds to determine the most relevant papers.
Findings –The study concludes that the current storage technologies are not viable for long-term
preservation of big data in digital libraries. They can neither fulfil all the storage demands nor alleviate the
financial expenditures of digital libraries. The study also points out that migrating to emerging storage
technologies in digital libraries is a long-term viable solution.
Research limitations/implications –The study suggests that continuous innovation and research efforts
in current storage technologies are required to lessen the impact of storage shortage ondigital libraries, and to
allow emerging storage technologies to advance further and take over. At the same time, more aggressive
research and development efforts are required by academics and industry to further advance the emerging
storage technologies for their timely and swift adoption by digital libraries.
Practical implications –The study reveals that digital libraries, besides incurring significant financial
expenditures, will suffer from potential loss of information due to storage shortage for long-term preservation
of big data, if current storage technologies are employed by them. Therefore, policy makers and practitioners
should meticulously choose storage technologies for long-term preservation of big data in digital libraries.
Originality/value –This type of holistic study that investigates the prospects of magnetic drive technology,
solid-state drive technology, and data-reduction techniques for long-term preservation of big data in digital
libraries has not been conducted in the field previously, and so provides a novel contribution. The study arms
academics, practitioners, policy makers, and industry with the deep understanding of the problem, technical
details to choose storage technologies meticulously, greater insight to frame sustainable policies,
and opportunities to address various research problems.
Keywords Digital libraries, Digital preservation, Big data, Data reduction, Flash technology,
Magnetic storage
Paper type Research paper
1. Introduction
Long-term preservation of digital information strives to protect valuable information for
access by present and future generations. Digital libraries include long-term digital
preservation as one of their core functions. Digital initiatives across the globe have led to
digitisation of millions of manuscripts, periodicals, and other resources including audio and
video. Various governments are supporting the shift to digital records and preservation
(Adu et al., 2016; Adu and Ngulube, 2016). With the emergence of big data, digital libraries
have become an increasingly significant area of research. Big data characteristics
significantly differ from those of traditional digital data in many ways. Big data has myriad
and manifold sources, and holds immense potential to deliver new knowledge and greater
insights (Foster, 2016). Digital libraries generally receive big data from researchers. Big data
submitted by researchers is the result of the research work conducted by them across a wide
variety of subjects (Han, 2015), which is collected at atomic, molecular, geological, and
astronomical level. Digital libraries preserve the submitted research data for long term so as
Library Hi Tech
Vol. 36 No. 3, 2018
pp. 539-555
© Emerald PublishingLimited
0737-8831
DOI 10.1108/LHT-06-2017-0117
Received 24 June 2017
Revised 16 September 2017
18 November 2017
23 November 2017
25 November 2017
Accepted 25 November 2017
The current issue and full text archive of this journal is available on Emerald Insight at:
www.emeraldinsight.com/0737-8831.htm
539
Prospects of
current storage
technologies
to allow its continuous mining (Seadle, 2016), and varied future analysis (He and
Nahar, 2016) by researchers across the globe (Beyene, 2017). Unfortunately for digital
libraries, big data is huge in volume (Bhat and Quadri, 2015), and thus requires very large
capacity digital libraries for its long-term preservation.
Digital libraries that can fulfil all the storage demands of long-term preservation of big
data in a cost-effective way rely on the potential of storage technologies to do so.
These storagetechnologies must offer hugecapacity with minimal storagecost/bit and should
incur less infrastructure, maintenance, and operational costs. This implies that as big data
grows they also need to progress both technologically and economically to cope up with the
challenge.Unfortunately, in this era of big data,the rate of production of data has significantly
outpaced the growth of storage capacity of current storage technologies. As an example,
in 2013, the available storage capacity could holdjust 33 per cent of the digital universe, and
by 2020, it will be able to store less than15 per cent (Turner et al., 2014). Und oubtedly, there is
a growing gap between the volume of big data created and the extent of available storage
capacity to preserve it. In fact, by 2020, a minimum storage shortage of over six
zettabytes (ZBs) is anticipated by Seagate, as shown in the Figure 1.
While the generation of big data is encouraged for the value it promises, it significantly
challenges current storage technologies to cope up with the preservation of huge volumes of
the data. Unfortunately, digital libraries are not exonerated from the implications of this
storage shortage. In fact, it significantly challenges the potential of digital libraries to fulfil
their core task of long-term preservation of big data, and thus questions the prospects of
current storage technologies in digital libraries. Therefore, academics and practitioners alike
are confronted by one important question, which is:
RQ1. What are the prospects of current storage technologies for long-term preservation
of big data in digital libraries?
Answering this question can arm policy makers, practitioners, academics, and industry with
deep understanding of the problem, and knowledge to choose appropriate storage
technologies, make right decisions and frame sustainable policies for long-term preservation
of big data in digital libraries. This paper attempts to answer this question by employing a
systematic and critical review of the relevant literature to identify the challenges,
limitations, and advances of current storage technologies and investigates their prospects in
digital libraries.
The rest of this paper is organised as follows. Section 2 introduces the research
methodology adopted by the study. Section 3 investigates the technological limitations of
Zettabytes
10
Enterprise cloud demand
Personal cloud demand
Compute SSD demand
Client HDD demand
Zettabyte demand trend
9
8
7
6
5
4
3
2
1
0
CY00 CY01 CY02 CY03 CY04 CY05 CY06 CY07 CY08 CY09 CY10 CY11 CY12 CY13 CY14 CY15 CY16 CY17 CY18 CY19 CY20
Base case HDD industry output Base case SDD industry output
Figure 1.
Seagate’s projected
gap between storage
supply and demand
540
LHT
36,3
To continue reading
Request your trial