Exploring digital preservation requirements. A case study from the National Geoscience Data Centre (NGDC)

Date17 July 2017
Publication Date17 July 2017
AuthorJaana Pinnick
SubjectInformation & knowledge management,Information management & governance
Exploring digital preservation
A case study from the National Geoscience
Data Centre (NGDC)
Jaana Pinnick
Department of Informatics Directorate, British Geological Survey,
Keyworth, UK
Purpose The aim of this paper was to explore digital preservation requirements within the wider National
Geoscience Data Centre (NGDC) organisational framework in preparation for developing a preservation policy
and integrating associated preservation workows throughout the existing research data management
processes. This case study is based on an MSc dissertation research undertaken at Northumbria University.
Design/methodology/approach This mixed methods case study used quantitative and qualitative
data to explore the preservation requirements and triangulation to strengthen the design validity. Corporate
and the wider scientic priorities were identied through literature and a stakeholder survey. Organisational
preparedness was investigated through staff interviews.
Findings Stakeholders expect data to be reliable, reusable and available in preferred formats. To ensure
digital continuity, the creation of high-quality metadata is critical, and data depositors need data management
training to achieve this. Recommendations include completing a risk assessment, creating a digital asset
register and a technology watch to mitigate against risks.
Research limitations/implications The main constraint in this study is the lack of generalisability of
results. As the NGDC is a unique organisation, it may not be possible to generalise the organisational ndings,
although those relating to research data management may be transferrable.
Originality/value This research examines the specic nature of geoscience data retention requirements
and looks at existing NGDC procedures in terms of enhancing digital continuity, providing new knowledge on
the preservation requirements for a number of national datasets.
Keywords Digital preservation, Digital repository, Data centre, Digital continuity,
Geoscience data management
Paper type Case study
Introduction and background
This paper explores the requirements of the National Geoscience Data Centre (NGDC) to
ensure that the long-term preservation and usability of its digital data are supported and
aligned to the corporate aims. It examines the specic characteristics of geoscience and
geospatial data and looks at the efciency of the existing data management procedures in
terms of digital continuity within the current challenging funding climate.
The NGDC is the designated repository for the Natural Environment Research Council
(NERC) grant-funded geoscience research data and the guardian for many commercially
funded datasets. It is hosted by the British Geological Survey (BGS) and responsible for
ensuring the availability of the data as one of the NERC Environmental Data Centres. BGS
corporate budgets and stafng levels have decreased during 2010-2015, whilst the volume of
The MSc research was supported by the British Geological Survey.
The current issue and full text archive of this journal is available on Emerald Insight at:
Received 13 April 2017
Accepted 13 April 2017
RecordsManagement Journal
Vol.27 No. 2, 2017
©Emerald Publishing Limited
DOI 10.1108/RMJ-04-2017-0009
digital data has more than doubled. Data management needs to consider the existing
organisational framework under the Research Councils UK and the Department of Business,
Energy and Industrial Strategy.
As a public sector organisation BGS is committed, on behalf of NERC as the legal entity,
to look after certain geoscience data in its care in perpetuity (Bowie, 2010) and to make most
of it openly available to a wide range of stakeholders, who in turn use the data to develop
products and services as well as to inform their decision-making. This requires the
organisation to monitor the on-going condition of digital data and to take appropriate actions
in collaboration with its stakeholders to ensure the usability, trustworthiness and future
interoperability of those data. These attributes can only be achieved if data remain both
accessible and understandable for future users.
The geoscience data held at the NGDC include a wide range of data types including but
not limited to borehole, bedrock, hydrogeology, geochemistry, seismic, marine geoscience, oil
and gas, airborne geophysical and geohazards data. They have been collected and
accumulated over long periods of time and are used by industry, manufacturing,
construction and transport sections, as well as the public sector and academia and
researchers, to build UK infrastructure, develop insurance and other data products, innovate,
build risk models, answer science questions, e.g. in climate change research, and to support
many geoscience applications. Many stakeholders (40 per cent) have been using the NGDC
datasets for over ten years. The use of numerous proprietary software packages (Vulcan,
MicroStation) over the years, the lack of restrictions in used le formats in the past, and the
occasionally incomplete contextual metadata means that older digital data are not always
easily accessible to current users if preservation actions are not taken at appropriate times.
Past decisions made – or not made – by data creators and guardians at the ingestion phase
have a direct impact on the data quality today.
An additional strategic driver for building a digital preservation programme is the plan to
apply for a Trusted Repository Status under the Data Seal of Approval (DSA) and the
International Council for Science World Data System certication (RDA, 2016), requiring the
NGDC to provide evidence for its long-term preservation capability as a data repository. This
includes having a continuity plan in place to ensure the on-going preservation of data
holdings, ensuring the integrity and authenticity of the data and managing the long-term
preservation in a planned and documented way. Although a lot of work has already been
done, some of it is not documented consistently. The data centre currently holds over 275TB
of data and, although it has considered the digital preservation aspect before, has no formal
workows in place to incorporate preservation actions within data management processes
outside the Oracle relational database management systems. Maintaining the value and
usability of the data means introducing these workows has to be a priority in going
This exploratory case study started by reviewing literature to place the ndings in the
context of long-term preservation of digital geoscience data at a digital repository. The
eldwork phase used a mixed methods approach employing both quantitative and
qualitative data. The sequential design included quantitative and qualitative data to provide
a comprehensive analysis of the case “by combining information from complementary kinds
of data or sources” (Denscombe, 2008). The iterative approach used ethnographic methods
acknowledging the role of the researcher as part of the organisation studied (O’Reilly, 2012)
and constant comparative analysis to create categories from raw data (Pickard, 2013).

To continue reading

Request your trial

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT