veraPDF: building an open source, industry supported PDF/A validator for cultural heritage institutions
DOI | https://doi.org/10.1108/DLP-08-2016-0031 |
Pages | 156-165 |
Published date | 08 May 2017 |
Date | 08 May 2017 |
Author | Carl Wilson,Rebecca McGuinness,Joachim Jung |
Subject Matter | Library & information science,Librarianship/library management,Library technology,Records management & preservation,Information repositories |
veraPDF: building an open source,
industry supported PDF/A validator
for cultural heritage institutions
Carl Wilson,Rebecca McGuinness and Joachim Jung
Open Preservation Foundation, Wetherby, UK
Abstract
Purpose –This paper describes the developmentof the veraPDF validator. The objective of veraPDF is to
build an industry supported, open source validator for all parts and conformance levels of the PDF/A
specification for archival PDF documents. The project is led by the Open Preservation Foundation and the
PDF Associationand is funded by the EU PREFORMA project.
Design/methodology/approach –veraPDF is designed to meet the needs of the digital preservation
community and the PDFindustry alike. The technology is subject to the review of and acceptance by the PDF
Association’s PDF Validation Technical Working Group, including many participants of the relevant ISO
working groups. Cultural heritageinstitutions are collecting ever-increasing volumes of digital information,
which they have a mandate topreserve for the long term. However, in many cases, they need to ensuretheir
content has been produced to the specifications of a standardfile format, as well as any acceptance criteria
stated in their institutionalpolicy.
Findings –With increasing knowledge and experience of processes and policies, cultural heritage
institutions are influencing the production and development of digital preservation software. The product
developmentfunded by the PREFORMA project shows how suchcooperation can benefit the community as a
whole.
Originality/value –This paper describes the value of an open sourceapproach to developing a PDF/A
validatorfor cultural heritage organisations.
Keywords Open source, Standards, Validation, Digital preservation, PDF/A, File formats
Paper type Case study
Introduction
Cultural heritage institutions are collecting ever-increasing volumes of digital cultural and
scientific content, which they are mandated to preserve for long-term accessibility. Digital
content is received in specificfile formats, including text documents, images, sound, and
video. It is produced and rendered by software from many differentvendors. A file format is
a standard method for encoding data in a computer file. The technical description of a file
format is called a fileformat specification.
The specification representsa contract between the different software that can be used to
produce, redact or render a particular file. As long as all of the software applications
understand, interpret and implement the specification in the same way, they can be used
interchangeablyon any file.
Unfortunately, file specifications come in many different forms. Some are proprietary,
meaning they are defined and controlledby a commercial entity. Some format specifications
are not even published or documented at all. Even the existence of a canonical, well-
documented format specification does not guarantee interoperability. Many format
specifications are long and complex, making consistent interpretation a problem. This
problem affects both organisations that produce and those that preserve digital content. In
DLP
33,2
156
Received 3 August 2016
Accepted 24 August 2016
DigitalLibrary Perspectives
Vol.33 No. 2, 2017
pp. 156-165
© Emerald Publishing Limited
2059-5816
DOI 10.1108/DLP-08-2016-0031
The current issue and full text archive of this journal is available on Emerald Insight at:
www.emeraldinsight.com/2059-5816.htm
To continue reading
Request your trial