veraPDF: building an open source, industry supported PDF/A validator for cultural heritage institutions

DOIhttps://doi.org/10.1108/DLP-08-2016-0031
Pages156-165
Published date08 May 2017
Date08 May 2017
AuthorCarl Wilson,Rebecca McGuinness,Joachim Jung
Subject MatterLibrary & information science,Librarianship/library management,Library technology,Records management & preservation,Information repositories
veraPDF: building an open source,
industry supported PDF/A validator
for cultural heritage institutions
Carl Wilson,Rebecca McGuinness and Joachim Jung
Open Preservation Foundation, Wetherby, UK
Abstract
Purpose This paper describes the developmentof the veraPDF validator. The objective of veraPDF is to
build an industry supported, open source validator for all parts and conformance levels of the PDF/A
specication for archival PDF documents. The project is led by the Open Preservation Foundation and the
PDF Associationand is funded by the EU PREFORMA project.
Design/methodology/approach veraPDF is designed to meet the needs of the digital preservation
community and the PDFindustry alike. The technology is subject to the review of and acceptance by the PDF
Associations PDF Validation Technical Working Group, including many participants of the relevant ISO
working groups. Cultural heritageinstitutions are collecting ever-increasing volumes of digital information,
which they have a mandate topreserve for the long term. However, in many cases, they need to ensuretheir
content has been produced to the specications of a standardle format, as well as any acceptance criteria
stated in their institutionalpolicy.
Findings With increasing knowledge and experience of processes and policies, cultural heritage
institutions are inuencing the production and development of digital preservation software. The product
developmentfunded by the PREFORMA project shows how suchcooperation can benet the community as a
whole.
Originality/value This paper describes the value of an open sourceapproach to developing a PDF/A
validatorfor cultural heritage organisations.
Keywords Open source, Standards, Validation, Digital preservation, PDF/A, File formats
Paper type Case study
Introduction
Cultural heritage institutions are collecting ever-increasing volumes of digital cultural and
scientic content, which they are mandated to preserve for long-term accessibility. Digital
content is received in specicle formats, including text documents, images, sound, and
video. It is produced and rendered by software from many differentvendors. A le format is
a standard method for encoding data in a computer le. The technical description of a le
format is called a leformat specication.
The specication representsa contract between the different software that can be used to
produce, redact or render a particular le. As long as all of the software applications
understand, interpret and implement the specication in the same way, they can be used
interchangeablyon any le.
Unfortunately, le specications come in many different forms. Some are proprietary,
meaning they are dened and controlledby a commercial entity. Some format specications
are not even published or documented at all. Even the existence of a canonical, well-
documented format specication does not guarantee interoperability. Many format
specications are long and complex, making consistent interpretation a problem. This
problem affects both organisations that produce and those that preserve digital content. In
DLP
33,2
156
Received 3 August 2016
Accepted 24 August 2016
DigitalLibrary Perspectives
Vol.33 No. 2, 2017
pp. 156-165
© Emerald Publishing Limited
2059-5816
DOI 10.1108/DLP-08-2016-0031
The current issue and full text archive of this journal is available on Emerald Insight at:
www.emeraldinsight.com/2059-5816.htm

To continue reading

Request your trial

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT