A hierarchical topic analysis tool to facilitate digital humanities research

DOIhttps://doi.org/10.1108/AJIM-11-2021-0325
Published date29 April 2022
Date29 April 2022
Pages1-19
Subject MatterLibrary & information science,Information behaviour & retrieval,Information & knowledge management,Information management & governance,Information management
AuthorChih-Ming Chen,Szu-Yu Ho,Chung Chang
A hierarchical topic analysis
tool to facilitate digital
humanities research
Chih-Ming Chen and Szu-Yu Ho
Graduate Institute of Library, Information and Archival Studies,
National Chengchi University, Taipei City, Taiwan, and
Chung Chang
Research Center for Chinese Cultural Subjectivity, National Chengchi University,
Taipei City, Taiwan
Abstract
Purpose This study aims to develop a hierarchical topic analysis tool (HTAT) based on hierarchical Latent
Dirichelet allocation (hLDA) to support digital humanities research that is associated with the need of topic
exploration on the Digital Humanities Platform for Mr. Lo Chia-Luns Writings (DHP-LCLW). HTAT can assist
humanities scholars on distant reading with analysis of hierarchical text topics, through classifying time-
stamped texts into multiple historical eras, conducting hierarchical topic modeling (HTM) according to the
texts from different eras and presenting through visualization. The comparative network diagram is another
function provided to assist humanities scholars in comparing the difference in the topics they wish to explore
and to track how the concept of a topic changes over time from a particular perspective. In addition, HTAT can
also provide humanities scholars with the feature to view source texts, thus having high potential to be applied
in promoting the effectiveness of topic exploration due to simultaneously integrating both the topic exploration
functions of distant reading and close reading.
Design/methodology/approach This study adopts a counterbalanced experimental design to examine
whether there is significant differences in the effectiveness of topic inquiry, the number of relevant topics
inquired and the time spent on them when research participants were alternately conducting text exploration
using DHP-LCLW with HTAT or DHP-LCLW with Single-layer Topic Analysis Tool (SLTAT). A technology
acceptance questionnaire and semi-structured interviews were also conducted to understand the research
participantsperception and feelings toward using the two different tools to assist topic inquiry.
Findings The experimental results show that DHP-LCLW with HTAT could better assist the research
participants, in comparison with DHP-LCLW with SLTAT, to grasp the topic context of the texts from two
particular perspectives assigned by this study within a short period. In addition, the results of the interviews
revealed that DHP-LCLW with HTAT, in comparison with SLTAT, was able to provide a topic terms that
better met research participnatsexpectations and needs, and effectively guided them to the corresponding
texts for close reading. In the analysis of technology acceptance and interview data, it can be found that the
research participants have a high and positive tendency toward using DHP-LCLW with HTAT to assist topic
inquiry.
Research limitations/implications The JiebaChinese word segmentation systemwas used in the Mr. Lo
Chia-LunsWritings Databasein this study, to performword segmentation on Mr.Lo Chia-Luns writing textsfor
topic modeling basedon hLDA. Since Jieba word segmentation systemis a lexicon based word segmentation
system, it cannot identify new words that have still not been collected in the lexicon well. In this case, the
correctness of word segmentation on the target texts will affect the results of hLDA topic modeling, and the
effectivenessof HTAT in assisting humanitiesscholars for topic inquiry.
Practical implications An HTAT was developed to support digital humanities research in this study.
With HTAT, DHP-LCLW provides hmanities scholars with topic clues from different hierarchical perspectives
for textual exploration, and with temporal and comparative network diagrams to assist humanities scholars in
tracking the evolution of the topics of specific perspectives over time, to gain a more comprehensive
understanding of the overall context of the texts.
Originality/value In recent years, topic analysis technology that can automatically extract key topic
information from a large amount of texts has been developed rapidly, but the topics generated from traditional
Digital
humanities
research
1
The authors would like to thank the Research Center for Chinese Cultural Subjectivity of National
Chengchi University for financially supporting this research under Contract No. 109H21.
The current issue and full text archive of this journal is available on Emerald Insight at:
https://www.emerald.com/insight/2050-3806.htm
Received 18 December 2021
Revised 2 January 2022
16 March 2022
Accepted 9 April 2022
Aslib Journal of Information
Management
Vol. 75 No. 1, 2023
pp. 1-19
© Emerald Publishing Limited
2050-3806
DOI 10.1108/AJIM-11-2021-0325
topic analysis models like LDA (Latent Dirichelet allocation) make it difficult for users to understand the
differences in the topics of texts with different hierarchical levels. Thus, this study proposes HTAT which uses
hLDA to build a hierarchical topic tree with a tree-like structure without the need to define the number of topics
in advance, enabling humanities scholars to quickly grasp the concept of textual topics and use different
hierarchical perspectives for further textual exploration. At the same time, it also provides a combination
function of temporal division and comparative network diagram to assist humanities scholars in exploring
topics and their changes in different eras, which helps them discover more useful research clues or findings.
Keywords Digital humanities, Topic analysis, Hierarchical topic modelling, Text mining, Information
visualization, Digital humanities research platform
Paper type Research paper
1. Introduction
With therapid development of informationand communicationtechnology (ICT),digitalization
for thedevelopment of digitalhumanities has become a populartrend. This trend hasgradually
changed traditional paper-based research methods in the field of humanities, paving a digital
path tofacilitate new ways of thinkingfor humanities studies.In traditional humanitiesstudies,
humanities scholars analyze texts mainly through manual text reading, and then summarize
and organize them. The process is time and labor-consuming, it could still be accomplished
gradually throughcare and patience. However, it is very difficultto master a large number of
texts with a continuous temporalcontext and to observe and analyzethe relationship between
textual topics or events from mul tiple perspectives.
Digital humanity is an emerging interdisciplinary research field in which information
technology and the humanities disciplines are integrated to support research (Drucker, 2013), it
involves the systematic use of digital databases and tools in the humanities disciplines, and
makes new teaching and research possible. Gao et al.study(2021) indicated that digital
humanities research emphasizes applying digital thinking and information technology to explore
humanities resources, so information resources construction is essential. Driven by new
information technologies, humanities resource storage and management methods have changed,
from database with scanned images and corresponding metadata for information retrieval
systems to digital humanities database with scanned images, corresponding metadata and full-
text for digital humanities research platforms. Particularly, full-text can be intelligently analyzed
by natural language processing or text mining technologies so that humanities scholars can
interpret textual clues hidden in full-text more easily. Liang et al. sstudy(2020)explored the task
design and assignment of full-text generation on mass Chinese historical archives (CHAs) by
crowdsourcing, particularly in paying attention to how to best divide full-text generation tasks
into smaller ones assigned to crowdsourced volunteers and to improve the digitization of mass
CHAs and the data-oriented processing of the digital humanities. The above mentioned
development has promoted the development of digital humanities infrastructures, and more and
more digital humanities research platforms are being establishedand used around the world (Gao
et al., 2021).
Among the studieson digital humanities, Correll et al. (2014) pointed out that links of close
reading are an important method to support the hypothesis of distant reading. In a study by
J
anickeet al. (2017), it was also mentionedthat humanities scholarsmust be able to link directly
to the sourcetext when using digital humanitiestools for distant reading. Sincethe accuracy of
currentlydeveloped digital humanitiestools is still not 100%, thesetools should play the role of
assisting humanities scholars in their research rather than replacing them. Therefore, digital
humanities analysis tools should havethe capacity to provide distant reading function at the
abstract level, as well as the capacity to link distant reading to close reading, so that the
researchers can moreeasily interpret the texts that they want to analyze(Moretti et al., 2016).
In recent years, text mining technology, which can automatically extract key information
from a largeamount of texts, havebeen rapidly developed,and topic modelinghas been widely
AJIM
75,1
2

To continue reading

Request your trial

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT