Towards automated pre-ingest workflow for bridging information systems and digital preservation services

Published date18 November 2019
DOIhttps://doi.org/10.1108/RMJ-05-2018-0011
Date18 November 2019
Pages289-304
AuthorParvaneh Westerlund,Ingemar Andersson,Tero Päivärinta,Jörgen Nilsson
Subject MatterInformation & knowledge management,Information management & governance
Towards automated pre-ingest
workow for bridging
information systems and digital
preservation services
Parvaneh Westerlund and Ingemar Andersson
Luleå University of Technology, Luleå, Sweden
Tero Päivärinta
Luleå University of Technology, Luleå, Sweden and M3S, University of Oulu,
Oulu, Finland, and
Jörgen Nilsson
Luleå University of Technology, Luleå, Sweden
Abstract
Purpose This paper aimsto automate pre-ingest workow for preserving digitalcontent, such as records,
through middleware that integrates potentially many information systems with potentially several
alternativedigital preservation services.
Design/methodology/approach This design research approach resulted in a design for model- and
component-based software for such workow. A proof-of-concept prototype was implemented and
demonstratedin context of a European research project, ForgetIT.
Findings The study identies design issues of automated pre-ingest for digital preservation while using
middleware as a design choice for this purpose. The resulting model and solution suggest functionalities and
interaction patterns based on open interface protocols between the source systems of digital content, middleware
and digital preservation services. The resulting workow automates the tasks of fetching digital objects from the
source system with metadata extraction, preservation preparation and transfer to a selected preservation service.
The proof-of-concept veried that the suggested model for pre-ingest workow and the suggested component
architecture was technologically implementable. Future research and development needs to include new
solutions to support context-aware preservation management with increased support for conguring submission
agreements as a basis for dynamic automation of pre-ingest and more automated error handling.
Originality/value The paper addresses design issues for middleware as a design choice to support
automated pre-ingestin digital preservation. The suggested middleware architecturesupports many-to-many
relationships between the source information systems and digital preservation services through open
interfaceprotocols, thus enabling dynamic digitalpreservation solutions for records management.
Keywords Workow, Middleware, Long-term digital preservation
Paper type Technical paper
Introduction
Pre-ingest is the preparatorystage for transferring digital records from information systems
(ISs) to one or more long-term digital preservation systems (DPS) (Kärberg, 2015). During
This work was partially funded by the European Commission in the context of the FP7 ICT project
ForgetIT (under grant no: 600826).
Digital
preservation
services
289
Received1 May 2018
Revised1 May 2018
27July 2018
27November 2018
Accepted12 December 2018
RecordsManagement Journal
Vol.29 No. 3, 2019
pp. 289-304
© Emerald Publishing Limited
0956-5698
DOI 10.1108/RMJ-05-2018-0011
The current issue and full text archive of this journal is available on Emerald Insight at:
www.emeraldinsight.com/0956-5698.htm
this stage, contents are prepared to comply with the requirements of the ingest function of
an archival system (CCSDS, 2012) that receives the preserved content in a DPS. The pre-
ingest phase is crucial because it affects all subsequent preservation activities (CCSDS,
2012). However, preparing digital content for submission to a long-term preservation
repository requires both time and effort. Content producers are often reluctant to make
investments to meet detailed preservation and submission guidelines, while incomplete
information and insufcient metadata documentation are causing excessive costs on the
side of archives (Rosenthal et al.,2005). A manual approach to pre-ingest is not a suitable
strategy for preservation of digital material in the long term (Ross, 2012). Producers of
digital records and preservation organizations need to co-operate for long-term digital
preservation (DP), aided by tools that automatically capture metadata and support the
appraisal process (Hedstrom et al., 2008). While the literature addressed the issue of pre-
ingest automation a while ago, projects to develop pre-ingest tools and elements of
varyingly automated solutions startedto emerge not before the mid-2010s (Päivärinta et al.,
2015;Kärberg, 2016;Lehtonenet al.,2017).
Lehtonen et al. (2017), in context of establishing a national DP service in Finland,
addressed the need for developing modular, exibly modiable and easy-to-integrate pre-
ingest workows, to receive digital content from several producer organizations and their
potentially many ISs.Moreover, the employed DPSs will change over timeas well (Afrasiabi
Rad et al.,2014). That is, a DP solution should help to congure, automate, and manage DP
workows in a context of potentially many-to-many integrations needed to bridge ISs
producing digital content and long-term DPSs. Päivärinta et al. (2015) suggested an overall
conceptual model for a supporting middleware for such a context and denoted the design
problem of supporting automation of information transfer fromISs to DPSs and vice versa.
The objective of this researchis to develop and demonstrate a workow in such middleware
for automating pre-ingest tasks in the context of (potentially) many-to-many interactions
between ISs and DPSs. This paper presents the designed three main functionalities of the
suggested pre-ingest solution: selecting digital objects from a source system in harmony
with automated fetching mechanisms for preserved materials, preservation preparation
with automated metadata extraction and creation and transfer of submission information
packages (SIPs; for standard denitions according to terminology of the Open Archival
Information System (OAIS) model (CCSDS, 2012). The future challenges for research and
development of automated pre-ingest solutions are addressed based on the experimental
tests conducted in context of the ForgetIT project (Gallo et al.,2018). The key contributions
include a proof-of-concept for a solution for the above-mentioned three functionalities with
identied design challenges,software components and their interaction patterns.
In the remainder of the paper, we rst outline the related work on pre-ingest solutions
after which we introduce the major design issues and challenges given in the project that
formed the basis for our work. Thereafter, the paper reports design of the pre-ingest
architecture and workow describing the interaction patterns between its software
components with the results of preliminary tests. The paper rounds up with a discussionon
the contributions and futureresearch.
Related work
With an increasing amount, size and complexity of digital content, it is not feasible to
manually deal with the preparation of material for DP, considering thecost of staff as well
as the complexity of manual processes (Ross, 2012;Kärberg, 2016;Lehtonen et al.,2017).
Therefore, there is a need for automating the pre-ingestprocesses while involving archiving
staff only when a humandecision is necessary (Hedges et al.,2009).
RMJ
29,3
290

To continue reading

Request your trial

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT