Digital preservation at Big Data scales: proposing a step-change in preservation system architectures

Published date17 September 2018
Pages524-538
DOIhttps://doi.org/10.1108/LHT-06-2017-0122
Date17 September 2018
AuthorDavid Maynard Gerrard,James Edward Mooney,Dave Thompson
Digital preservation at Big Data
scales: proposing a step-change in
preservation system architectures
David Maynard Gerrard
Cambridge University Library, University of Cambridge, Cambridge, UK
James Edward Mooney
The Bodleian Library, The University of Oxford, Oxford, UK, and
Dave Thompson
The National Archives, London, UK
Abstract
Purpose The purpose of this paper is to consider how digital preservation system architectures will
support business analysis of large-scale collections of preserved resources, and the use of Big Data analyses
by future researchers.
Design/methodology/approach This paper reviews the architecture of existing systems, then discusses
experimental surveys of large digital collections using existing digital preservation tools at Big Data scales.
Finally, it introduces the design of a proposed new architecture to work with Big Data volumes of preserved
digital resources also based upon experience of managing a collection of 30 million digital images.
Findings Modern visualisation tools enable business analyses based on file-related metadata, but most
currently available systems need more of this functionality out-of-the-box. Scalability of preservation
architecture to Big Data volumes depends upon the ability to run preservation processes in parallel,
so indexes that enable effective sub-division of collections are vital. Not all processes scale easily: those that
do not require complex management.
Practical implications The complexities caused by scaling up to Big Data volumes can be seen as being
at odds with preservation, where simplicity matters. However, the sustainability of preservation systems
relates directly to their usefulness, and maintaining usefulness will increasingly depend upon being able to
process digital resources at Big Data volumes. An effective balance between these conflicting situations must
be struck.
Originality/value Preservation systems are at a step-change as they move to Big Data scale architectures
and respond to more technical research processes. This paper is a timely illustration of the state of play at this
pivotal moment.
Keywords Big Data, Architecture, Digital preservation, Image processing, Business analytics, Digitization
Paper type Conceptual paper
1. Introduction
Collections of preserved digital resources are becoming vital sources of Big Data for
researchers. This paper describes how the architectures of digital preservation systems
might evolve over the coming years to cope with new requirements driven by the use of Big
Data in research. The need for preservation systems to support management decisions
regarding very large sets of data is also covered. While it is also a very important, emerging
topic, this paper does not discuss the long-term preservation of Big Data sets themselves.
Requirements related to the use of large-scale collections of preserved digital
resources have emerged from research into digital preservation at two internationally
important university libraries: Cambridge University Library (CUL) and Bodleian
Libraries, Oxford University (Bodleian). Furthermore, the architectural direction we
Library Hi Tech
Vol. 36 No. 3, 2018
pp. 524-538
© Emerald PublishingLimited
0737-8831
DOI 10.1108/LHT-06-2017-0122
Received 30 June 2017
Revised 8 December 2017
Accepted 21 December 2017
The current issue and full text archive of this journal is available on Emerald Insight at:
www.emeraldinsight.com/0737-8831.htm
The authors would like to thank the Polonsky Foundation for funding this research, and the other
Polonsky Digital Preservation Fellows, Edith Halvarsson, Somaya Langley, Sarah Mason and Lee
Pretlove, for their support.
524
LHT
36,3

To continue reading

Request your trial

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT