Provenance description of metadata application profiles for long-term maintenance of metadata schemas

DOIhttps://doi.org/10.1108/JD-03-2017-0042
Published date08 January 2018
Date08 January 2018
Pages36-61
AuthorChunqiu Li,Shigeo Sugimoto
Subject MatterLibrary & information science,Records management & preservation,Document management,Classification & cataloguing,Information behaviour & retrieval,Collection building & management,Scholarly communications/publishing,Information & knowledge management,Information management & governance,Information management,Information & communications technology,Internet
Provenance description of
metadata application profiles
for long-term maintenance
of metadata schemas
Chunqiu Li
Graduate School of Library, Information and Media Studies,
University of Tsukuba, Tsukuba, Japan, and
Shigeo Sugimoto
Faculty of Library, Information and Media Science,
University of Tsukuba, Tsukuba, Japan
Abstract
Purpose Provenance information is crucial for consistent maintenance of metadata schemas over time.
The purpose of this paper is to propose a provenance model named DSP-PROV to keep track of structural
changes of metadata schemas.
Design/methodology/approach The DSP-PROV model is developed through applying the general
provenance description standard PROV of the World Wide Web Consortium to the Dublin Core Application
Profile. Metadata Application Profile of Digital Public Library of America is selected as a case study to apply
the DSP-PROV model. Finally, this paper evaluates the proposed model by comparison between formal
provenance description in DSP-PROV and semi-formal change log description in English.
Findings Formal provenance description in the DSP-PROV model has advantages over semi-formal
provenance description in English to keep metadata schemas consistent over time.
Research limitations/implications The DSP-PROV model is applicable to keep track of the structural
changes of metadata schema over time. Provenance description of other features of metadata schema such as
vocabulary and encoding syntax are not covered.
Originality/value This study proposes a simple model for provenance description of structural features of
metadata schemas based on a few standards widely accepted on the Web and shows the advantage of the
proposed model to conventional semi-formal provenance description.
Keywords RDF, Description set profile, Dublin core application profile, Metadata application profile,
Metadata maintenance, Metadata schema, PROV, Provenance description
Paper type Research paper
1. Introduction
There are well-knownstandards and servicesfor long-term use of digitalobjects. For instance,
Open ArchivalInformation System (OAIS) referencemodel (Consultative Committeefor Space
Data System (CCSDS), 2012) adopted as ISO standard 14721:2012, Preservation Metadata:
ImplementationStrategies (PREMIS) metadatastandard[1], PRONOM registry service[2], and
Heritrix crawler[3] for Web-archiving. Metadata plays important roles in these standards
and services to the longevity of digital objects (Dempsey and Heery, 1998; Chilvers, 2002;
Poole, 2016). Metadata longevity should be ensured, as well as the longevity of digital
objects for future use. An essential requirement of metadata longevity is keeping metadata
interpretable by both machines and humans over time. Long-term maintenance of
metadata schemas and metadata vocabularies is significant issue for keeping metadata
Journal of Documentation
Vol. 74 No. 1, 2018
pp. 36-61
© Emerald PublishingLimited
0022-0418
DOI 10.1108/JD-03-2017-0042
Received 27 March 2017
Revised 9 September 2017
Accepted 17 September 2017
The current issue and full text archive of this journal is available on Emerald Insight at:
www.emeraldinsight.com/0022-0418.htm
This study is supported in part by JSPS Kaken Grant-in-Aid for Scientific Research (A)
(Grant No.:16H01754). The authors express sincere thanks to Professors Atsuyuki Morishima,
Tetsuo Sakaguchi, Mitsuharu Nagamori, Mr Tsunagu Honma, Dr Tetsuya Mihara, and all the
members of the laboratory for their useful comments and help to this study.
36
JD
74,1
interpretableover time. This paper discusses the research problem of long-term maintenance
of metadata schemas.There is still no well-recognized standard for the longevity of metadata
schemas, although there are de facto and international standards designed for interoperable
metadata such as Dublin Core Application Profiles (DCAPs) and ISO/IEC 11179[4]
for metadata registries. The consistent maintenance of metadata schemas and change
tracking of the structural features of metadata are both required for metadata longevity.
Provenance is literally from the French word provenirand means come from.
Provenance provides source, history, and derivation of an object. In general, provenance
describes how an object came to the current state since its origination. According to OAIS
and PREMIS that are well-known standards for preservation of digital objects, provenance
is crucial evidence for authenticity of digital objects (CCSDS, 2012; PREMIS Editorial
Committee, 2015). Provenance has been used in a wide range of domains for identifying data
trustworthiness, tracking ownership, and/or authors hip of works, auditing errors,
reproducing research data, and so forth. Furthermore, provenance research is within the
scope of various fields which need long-term maintenance of resources, such as research
data, webpages, etc. However, provenance research in metadata community especially the
issues in metadata schema maintenance research is quite limited.
In conventional systems, since metadata in conventional services has been mostly organized
as a database, maintenance of the metadata is likely to be recognized as maintenance of
the database. In such environment, the schemas of the metadata are documented as a part of the
database schema. Those schema documents are maintained primarily for human readers.
We consider that this is the main reason of the lack of research on long-term maintenance of
metadata schemas. However, in the state-of-the-art Web environment today so called Linked
Open Data (LOD) environment, we need metadata schema maintenance technologies drastically
different from that used in the conventional database-centric environment. This is because both
metadata and their schemas can be encoded in XML and transferred from a site to another as a
first-class object. Sugimoto et al. (2016) presented differences between conventional and LOD
environments for metadata schema maintenance and discussed facets in long-term maintenance
of metadata schemas in the LOD environment. Long-term maintenance of metadata schemas in
the LOD environment need to use the technologies that fit to LOD but these are not well
developed yet.
We have learned the importance of provenance description of metadata schemas from
Preservation Description Information (PDI) of OAIS. Among the five categories in PDI, which
are reference, provenance, context, fixity, and access rights, the provenance category is directly
related to events which may cause changes in the preserved objects. It is crucial for long-term
maintenance of metadata to keep track of changes in their metadata schema as a digital object
which should be readable by machines as well as humans. We consider that provenance
description of metadata schemas in a description scheme standardized for LOD such as
Resource Description Framework (RDF) is crucial for the longevity of metadata.
In this study, we aim at proposing a model to formally describe provenance of metadata
application profiles (MAPs) for automated tracking of their change history and consistent
maintenance of metadata over time.
We analyzed the existing provenance description models and vocabularies (Li and Sugimoto,
2014) and learned that: some models are general and can be tuned to specific domains, for
example, PROV data model, Open Provenance Model; some are designed to specific applications,
for instance, BBC Provenance Ontology. The existing models do not cover description of
structural features of metadata. In other words, those models lack classes and properties defined
for describing changes in MAPs. Therefore, in this paper, we analyze requirements to describe
revision history of MAPs and define a provenance description model for MAPs.
In the long term, changes in metadata schemas may cause inconsistencies and incorrect
interpretation of metadata. Hence, provenance description that describes revision history of
37
Provenance
description
of MAPs

To continue reading

Request your trial

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT