Transforming scholarship in the archives through handwritten text recognition. Transkribus as a case study

DOIhttps://doi.org/10.1108/JD-07-2018-0114
Date09 September 2019
Published date09 September 2019
Pages954-976
AuthorGuenter Muehlberger,Louise Seaward,Melissa Terras,Sofia Ares Oliveira,Vicente Bosch,Maximilian Bryan,Sebastian Colutto,Hervé Déjean,Markus Diem,Stefan Fiel,Basilis Gatos,Albert Greinoecker,Tobias Grüning,Guenter Hackl,Vili Haukkovaara,Gerhard Heyer,Lauri Hirvonen,Tobias Hodel,Matti Jokinen,Philip Kahle,Mario Kallio,Frederic Kaplan,Florian Kleber,Roger Labahn,Eva Maria Lang,Sören Laube,Gundram Leifert,Georgios Louloudis,Rory McNicholl,Jean-Luc Meunier,Johannes Michael,Elena Mühlbauer,Nathanael Philipp,Ioannis Pratikakis,Joan Puigcerver Pérez,Hannelore Putz,George Retsinas,Verónica Romero,Robert Sablatnig,Joan Andreu Sánchez,Philip Schofield,Giorgos Sfikas,Christian Sieber,Nikolaos Stamatopoulos,Tobias Strauß,Tamara Terbul,Alejandro Héctor Toselli,Berthold Ulreich,Mauricio Villegas,Enrique Vidal,Johanna Walcher,Max Weidemann,Herbert Wurster,Konstantinos Zagoris
Subject MatterLibrary & information science,Records management & preservation,Document management,Classification & cataloguing,Information behaviour & retrieval,Collection building & management,Scholarly communications/publishing,Information & knowledge management,Information management & governance,Information management,Information & communications technology,Internet
Transforming scholarship in the
archives through handwritten
text recognition
Transkribus as a case study
Guenter Muehlberger, Louise Seaward, Melissa Terras,
Sofia Ares Oliveira, Vicente Bosch, Maximilian Bryan,
Sebastian Colutto, Hervé Déjean, Markus Diem, Stefan Fiel,
Basilis Gatos, Albert Greinoecker, Tobias Grüning, Guenter Hackl,
Vili Haukkovaara, Gerhard Heyer, Lauri Hirvonen, Tobias Hodel,
Matti Jokinen, Philip Kahle, Mario Kallio, Frederic Kaplan,
Florian Kleber, Roger Labahn, Eva Maria Lang, Sören Laube,
Gundram Leifert, Georgios Louloudis, Rory McNicholl,
Jean-Luc Meunier, Johannes Michael, Elena Mühlbauer,
Nathanael Philipp, Ioannis Pratikakis, Joan Puigcerver Pérez,
Hannelore Putz, George Retsinas, Verónica Romero, Robert Sablatnig,
Joan Andreu Sánchez, Philip Schofield, Giorgos Sfikas, Christian Sieber,
Nikolaos Stamatopoulos, Tobias Strauß, Tamara Terbul,
Alejandro Héctor Toselli, Berthold Ulreich, Mauricio Villegas,
Enrique Vidal, Johanna Walcher, Max Weidemann,
Herbert Wurster and Konstantinos Zagoris
(Information about the authors can be found at the end of this article.)
Abstract
Purpose An overview of the current use of handwritten text recognition (HTR) on archival manuscript
material, as provided by the EU H2020 funded Transkribus platform. It explains HTR, demonstrates
Transkribus, gives examples of use cases, highlights the affect HTR may have on scholarship, and evidences
this turning point of the advanced use of digitised heritage content. The paper aims to discuss these issues.
Journal of Documentation
Vol. 75 No. 5, 2019
pp. 954-976
Emerald Publishing Limited
0022-0418
DOI 10.1108/JD-07-2018-0114
Received 18 July 2018
Revised 20 December 2018
Accepted 27 December 2018
The current issue and full text archive of this journal is available on Emerald Insight at:
www.emeraldinsight.com/0022-0418.htm
© Guenter Muehlberger, Louise Seaward, Melissa Terras, Sofia Ares Oliveira, Vicente Bosch, Maximilian
Bryan, Sebastian Colutto, Hervé Déjean, Markus Diem, Stefan Fiel, Basilis Gatos, Albert Greinoecker,
Tobias Grüning, Guenter Hackl, Vili Haukkovaara, Gerhard Heyer, Lauri Hirvonen, Tobias Hodel,
Matti Jokinen, Philip Kahle, Mario Kallio, Frederic Kaplan, Florian Kleber, Roger Labahn, Eva Maria Lang,
Sören Laube, Gundram Leifert, Georgios Louloudis, Rory McNicholl, Jean-Luc Meunier, Johannes Michael,
Elena Mühlbauer, Nathanael Philipp, Ioannis Pratikakis, Joan Puigcerver Pérez, Hannelore Putz,
George Retsinas, Verónica Romero, Robert Sablatnig, Joan Andreu Sánchez, Philip Schofield, Giorgos
Sfikas, Christian Sieber, Nikolaos Stamatopoulos, Tobias Strauß, Tamara Terbul, Alejandro Héctor Toselli,
Berthold Ulreich, Mauricio Villegas, Enrique Vidal, Johanna Walcher, Max Weidemann, Herbert Wurster
and Konstantinos Zagoris. Published by Emerald Publishing Limited. This article is published under the
Creative Commons Attribution (CC BY 4.0) licence. Anyone may reproduce, distribute, translate and create
derivative works of this article ( for both commercial & non-commercial purposes), subject to full attribution
to the original publication and authors. The full terms of this licence may be seen at http://creativecommons.
org/licences/by/4.0/legalcode
954
JD
75,5
Design/methodology/approach This paper adopts a case study approach, using the development and
delivery of the one openly available HTR platform for manuscript material.
Findings Transkribus has demonstrated that HTR is now a useable technology that can be employed in
conjunction with mass digitisation to generate accurate transcripts of archival material. Use cases are
demonstrated, and a cooperative model is suggested as a way to ensure sustainability and scaling of the
platform. However, funding and resourcing issues are identified.
Research limitations/implications The paper presents results from projects: further user studies could
be undertaken involving interviews, surveys, etc.
Practical implications Only HTR provided via Transkribus is covered: however, this is the only publicly
available platform for HTR on individual collections of historical documents at time of writing and it
represents the current state-of-the-art in this field.
Social implications The increased access to information contained within historical texts has the
potential to be transformational for both institutions and individuals.
Originality/value This is the first published overview of how HTR is used by a wide archival studies
community, reporting and showcasing current application of handwriting technology in the cultural heritage sector.
Keywords User studies, Library, Archives, Transcription, Neural networks, Digital humanities,
Digital library infrastructure, Handwritten text recognition, HTR, Transcribing
Paper type Research paper
Introduction
Archives are increasingly investingin the digitisation of their manuscriptcollections but until
recently the textual content of the resulting digital images has only been available to those
who have the time to study and transcribe individual passages. The use of computers to
process and searchimages of historical papers using handwritten text recognition (HTR) has
the potentialto transform access to our written pastfor the use of researchers, institutionsand
the general public. This paper reports on the Recognition and Enrichment of Archival
Documents (READ ) European Union Horizon 2020 pro ject which is developing advan ced text
recognition technology on the basis of artificial neural networks and resulting in a publicly
available infrast ructure: the Transkribus platform.Users of Transkribus (whetherinstitutional
or individual) are able to extract data from handwritten and printed texts via HTR, while
simultaneously contributing to the improvement of the same technology thanks to machine
learning principles. The automated recognition of a wide variety of historical texts has
significant implications for the accessibility of the written recordsof global cultural heritage.
This paper uses the Transkribus platform as a case study, focusing on the development,
application and impact of HTR technology. It demonstrates that HTR has the capacity to
make a significant contribution to the archival mission by making it easier for anyone to
read, transcribe, process and mine historical documents. It shows that the technology fits
neatly into the archival workflow, making direct use of growing repositories of digitised
images of historical texts. By providing examples of institutions and researchers who are
generating new resources with Transkribus, the paper shows how HTR can extend the
existing research infrastructure of the archives, libraries and humanities domain. Looking to
the future, this paper argues that this form of machine learning has the potential to change
the nature and scope of historical research. Finally, it suggests that a cooperative approach
from the archives, library and humanities community is the best way to support and sustain
the benefits of the technology offered through Transkribus.
Handwritten text recognition an overview
HTR is an active research area in the computational sciences, dating back to the mid-
twentieth century (Dimond, 1957). HTR was originally closely aligned to the development of
optical character recognition (OCR) technology, where scanned images of printed text are
converted into machine-encoded text, generally by comparing individual characters with
existing templates (Govindan and Shivaprasad, 1990; Schantz, 1982; Ul-Hasan et al., 2016).
HTR developed into a research area in its own right due to the variability of different hands,
and the computational complexity of the task (Bertolami and Bunke, 2008; Kichuk, 2015;
955
Transforming
scholarship in
the archives

To continue reading

Request your trial

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT