Development and evaluation of an automatic text annotation system for supporting digital humanities research

DOIhttps://doi.org/10.1108/LHT-10-2017-0219
Date16 September 2019
Published date16 September 2019
Pages436-455
AuthorChih-Ming Chen,Yung-Ting Chen,Chen-Yu Liu
Subject MatterLibrary & information science,Librarianship/library management,Library technology,Information behaviour & retrieval,Information user studies,Metadata,Information & knowledge management,Information & communications technology,Internet
Development and evaluation of an
automatic text annotation
system for supporting digital
humanities research
Chih-Ming Chen, Yung-Ting Chen and Chen-Yu Liu
National Chengchi University, Taipei, Taiwan
Abstract
Purpose An automatic text annotation system (ATAS) that can collect resources from different databases
through Linked Data (LD) for automatically annotating ancient texts was developed in this study to support
digital humanities research. It allows the humanists referring to resources from diverse databases when
interpreting ancient texts as well as provides a friendly text annotation reader for humanists interpreting
ancient text through reading. The paper aims to discuss whether the ATAS is helpful to support digital
humanities research or not.
Design/methodology/approach Based on the quasi-experimental design, the ATAS developed in this
study and MARKUS semi-ATAS were compared whether the significant differences in the reading
effectiveness and technology acceptance for supporting humanists interpreting ancient text of the Ming
dynastys collections existed or not. Additionally, lag sequential analysis was also used to analyze users
operation behaviors on the ATAS. A semi-structured in-depth interview was also applied to understand
usersopinions and perception of using the ATAS to interpret ancient texts through reading.
Findings The experimental results reveal thatthe ATAS has higher reading effectiveness than MARKUS
semi-ATAS,but not reaching the statistically significantdifference. The technology acceptanceof the ATAS is
significantlyhigher thanthat of MARKUS semi-ATAS.Particularly, the functioncomparisonof the twosystems
shows thatthe ATAS presents moreperceived ease of use on thefunctions of term search,connection to source
websites and adding annotation than MARKUS semi-ATAS. Furthermore, the reading interface of ATAS is
simple and understandable and is more suitable for reading than MARKUS semi-ATAS. Among all the
consideredLD sources, Moedict, which isan online Chinese dictionary,was confirmed as the most helpfulone.
Research limitations/implications This study adopted Jie ba Chinese parser to pe rform the word
segmentation process based on a parser lexicon for the Chinese ancient texts of the Ming dynastys
collections. The accu racy of word segmentation to a lex icon-based Chinese parser is lim ited due to ignoring
the grammar and semant ics of ancient texts. Mo reover, the original pa rser lexicon used in Ji eba Chinese
parser only contains the m odern words. This will reduce the accuracy o f word segmentation for Chinese
ancient texts. The two limitations that affect Jieba Chinese parser to correctly perform the word
segmentation process for Chinese ancient texts will significantly affect the effectiveness of using ATAS to
support digital humanit ies research. This study thus proposed a practi cable scheme by adding new terms
into the parser lexicon based onhumanistsse lf-judgment to improve the accura cy of word segmentation of
Jieba Chinese parser.
Practical implications Although some digital humanities platforms have been successfully developed to
support digital humanities research for humanists, most of them have still not provided a friendly digital
reading environment to support humanists on interpreting texts. For this reason, this study developed an
ATAS that can automatically retrieve LD sources from different databases on the Internet to supply rich
annotation information on reading texts to help humanists interpret texts. This study brings digital
humanities research to a new ground.
Originality/value This study proposed a novel ATAS that can automatically annotate useful information
on an ancient text to increase the readability of the ancient text based on LD sources from different databases,
thus helping humanists obtain a deeper and broader understanding in the ancient text. Currently, there is no
this kind of tool developed for humanists to support digital humanities research.
Keywords Digital humanities, User behaviour, Linked Data, Automatic segmentation of Chinese word,
Automatic text annotation system, Reading interface design
Paper type Research paper
Library Hi Tech
Vol. 37 No. 3, 2019
pp. 436-455
© Emerald PublishingLimited
0737-8831
DOI 10.1108/LHT-10-2017-0219
Received 31 October 2017
Revised 28 May 2018
10 August 2018
7 September 2018
Accepted 6 October 2018
The current issue and full text archive of this journal is available on Emerald Insight at:
www.emeraldinsight.com/0737-8831.htm
The authors would like to thank the Research Center for Chinese Cultural Subjectivity of National
Chengchi University for financially supporting this research under Contract No. 107H21.
436
LHT
37,3
1. Introduction
Since the initiation of Digital Archives Program in 2002, a lot of academic institutes in
Taiwan have digitalized the important archives. Although a large amount of data have been
accumulated in the past decade, most of such digital archive databases are independent and
cannot be integrated for the utilization. Besides, most humanists stay the imagination of
digital humanities at the stages of digital archives or the digitalization of historical data,
rather than thoroughly utilizing such resources for deeper research. Rosenzweig (2003)
indicated that a researcher did not encounter the lack of data but how to deal with excessive
data; therefore, how to make such data appear meanings was the problem for digital
humanities. Moreover, the text reading environment to support digital humanities research
is currently short. As the example of Taiwan History Digital Library (THDL) (Hsiang et al.,
2009) (http://thdl.ntu.edu.tw/index.html), the database covers more than a hundred thousand
full-text data of Tan-Hsin Archives, Ming and Qing Archives of Taiwan Administration,
and Ancient Contracts, but the digital library stresses on the development of data analysis
tools and is lacking a friendly data interpretation reader for humanists. Most humanists
therefore simply utilize the database for data search so that the benefit to support digital
humanities research is reduced. Another platform, CBETA Research Platform (CBETA-RP)
(http://cbeta-rp.dila.edu.tw/), provides an online reader for Chinese Buddhist texts, with
complete contents. It currently also provides researchers with reference of names; however,
there is merely mutual reference of internal data, and the integration with cross-platform
resources is insufficient (Tu et al., 2012).
To offer a digital humanities reading environment which could integrate cross-platform
resources, provide a friendly reader and digital tools for effectively assisting humanists in
digital humanities research, Scheinfeldt (2010) pointed out the similarity between a digital
humanities scholar and a scientist that both of them extremely depended on tools. A new
digital tool could solve the past humanities research problems. Monte and Serafin (2017)
indicated that the first and most salient theme that emerged in digital humanities research is
the requirement of digital reading and research tools. To effectively support digital
humanities research, Chen and Tsay (2017) proposed a novel collaborative annotation
system (CAS) with four types of multimedia annotations including text annotation, picture
annotation, voice annotation and video annotation, which can embed with any HTML web
pages to enable users to collaboratively add and manage annotations on HTML web pages
and provide a shared mechanism for discussing about shared annotations among multiple
users. By applying the CAS in mashup on static HTML web pages, their study discussed the
potential applications of CAS in digital humanities. However, the CAS is a kind of manual
annotation system. The quality of annotations from users may not be qualified enough to
support digital humanities research. Moreover, MARKUS semi-automatic text annotation
system is an online text reading and research tool developed by Ho and Hilde (2014) for
supporting digital humanities research. A user could upload texts and select the required
annotation types in MARKUS, which would then annotate the terms in the text as well as
provide the user with data search on Wikipedia, China Biographical Database (CBDB),
Temporal Gazetteer (TGAZ) and ZDict, to help the user interpret the text content online.
However, the annotation function of MARKUS is limited only for annotating the predefined
terms, including personal names, place names, temporal references and bureaucratic offices
in a text because of the absence of the function of automatic segmentation of word, thus
likely reducing the effectiveness of supporting humanists to interpret the text. As a result,
an automatic text annotation system (ATAS) for supporting digital humanities research
was developed in this study to collect resources from different databases, through LD, and
automatically annotate texts for the users real-time referring to resources from different
databases when interpreting texts. Besides, a friendly text annotation reader is provided for
humanists interpreting the data through reading. This study aims to confirm whether the
437
Digital
humanities
research

To continue reading

Request your trial

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT