A method for automatic analysis Table of Contents in Chinese books

Published date21 September 2015
Date21 September 2015
Pages424-438
DOIhttps://doi.org/10.1108/LHT-05-2015-0043
AuthorJing Chen,Quan Lu
Subject MatterLibrary & information science,Librarianship/library management,Library technology
A method for automatic analysis
Table of Contents in
Chinese books
Jing Chen
School of Information Management, HuaZhong Normal University,
Wuhan, China, and
Quan Lu
School of Information Management, Wuhan University, Wuhan, China
Abstract
Purpose The purpose of this paper is to propose a novel method to analyze Table of Contents (TOC)
in Chinese books automatically based on the hierarchy organization rules which gained by
investigation.
Design/methodology/approach This paper analyzed the main literature in this field first, then
hierarchy organization rules of Chinese book TOC were generated and the method parsing TOC
automatically based on these rules was proposed. A prototype system implementing the method was
also developed. The method was evaluated through processing a corpus on the prototype system, and
the results were checked with calculation of precision and recall.
Findings The experiment result illustrated the superiority (extensive application, recall is
95.34 percent and precision is 94.44 percent) of the method.
Practical implications The result can help Chinese libraries deal with electronic texts from four
aspects. First, it can be used to complement or enhance current digitization and optical character
recognition methods and cut the financial and labor cost of Chinese libraries. Second, it can help
libraries to keep information on indexing words as well as chapters, sections and subsections in
Chinese book databases, which ensures easy retrieval and extract any intended portion as demanded
by user. Third, it helps to enrich the services and then enhances the user experiences in Chinese
libraries. Fourth, it improves the specification and policy of digitalizing Chinese books.
Originality/value The paper provided insight into the hierarchy organization of TOCs in Chinese
books, the method based on the rules has extensive application than other methods. This method for
Chinese book TOC automatic analysis is also as reference for English book TOC automatic analysis.
Keywords Digital libraries, Automatic analysis technologies, Chinese books, Document analysis,
Hierarchical organization categories, Table of Contents
Paper type Research paper
Introduction
In recent years, the Chinese digital book resources and their applications are booming.
Table of Contents (TOC) analysis has drawn attention nowadays because it is a
collection of references to the different components of the document and naturally
reflects the logical structure of the entire document (Gao et al., 2010). In Chinas largest
comprehensive dictionary, the Cihai (Xia and Chen, 2009), the function of TOC is
described as listing the structure condition, title, page number of books and
periodicals.Lu (1971) in his dictionary of library science states that, TOC is an
Library Hi Tech
Vol. 33 No. 3, 2015
pp. 424-438
©Emerald Group Publishing Limited
0737-8831
DOI 10.1108/LHT-05-2015-0043
Received 17 February 2015
Revised 1 May 2015
Accepted 15 July 2015
The current issue and full text archive of this journal is available on Emerald Insight at:
www.emeraldinsight.com/0737-8831.htm
National Natural Science Foundation of China (project name: Research on Automatic Indexing for
Book Hierarchical Topics, Project No. 71303089). Independent research Project of Huazhong
Normal University (project name: Research on Automatic Analysis Table of Contents in Chinese
Books, Project No. CCNU14A05050).
424
LHT
33,3

To continue reading

Request your trial

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT