Domain-specific readability measures to improve information retrieval in the Persian language

Date04 June 2018
Published date04 June 2018
DOIhttps://doi.org/10.1108/EL-01-2017-0007
Pages430-444
AuthorSholeh Arastoopoor
Subject MatterInformation & knowledge management,Information & communications technology,Internet
Domain-specic readability
measures to improve information
retrieval in the Persian language
Sholeh Arastoopoor
Department of Information Science and Knowledge Studies,
Ferdowsi University of Mashhad, Mashhad, Iran
Abstract
Purpose The degree to which a text is considered readable depends on the capability of the reader. This
assumption puts different information retrieval systems at the risk of retrieving unreadable or hard-to-be-read
yet relevant documents for their users. This paper aims to examine the potential use of concept-based
readability measures along with classic measures for re-ranking search results in information retrieval
systems, specically in the Persian language.
Design/methodology/approach Flesch–Dayani as a classic readability measure along with document
scope (DS) and document cohesion (DC) as domain-specic measures have been applied for scoring the
retrieved documents from Google (181 documents) and the RICeST database (215 documents) in the eld of
computer science and information technology (IT). The re-ranked result has been compared with the ranking
of potential users regarding their readability.
Findings The results show that there is a difference among subcategories of the computer science and IT
eld according to their readability and understandability. This study also shows that it is possible to develop
a hybrid score based on DS and DC measures and, among all four applied scores in re-ranking the documents,
the re-ranked list of documents based on the DSDC score shows correlation with re-ranking of the participants
in both groups.
Practical implications The ndings of this study would foster a new option in re-ranking search
results based on their difculty for experts and non-experts in different elds.
Originality/value The ndings and the two-mode re-ranking model proposed in this paper along with
its primary focus on domain-specic readability in the Persian language would help Web search engines and
online databases in further rening the search results in pursuit of retrieving useful texts for users with
differing expertise.
Keywords Information retrieval, Document cohesion, Document scope, Flesch–Dayani formula,
Persian, Re-ranking search results, Readability scores
Paper type Research paper
Introduction
The degree to which a text is considered readable depends directly on the capabilities of the
reader (Badgett, 2010). If the text is too simple or too difcult, it actually delivers no new
information to its audience. This assumption puts different information retrieval systems at
the risk of retrieving unreadable or hard-to-be-read yet relevant documents for their users. In
other words, users with different specialties in different subject areas may not have similar
reading capabilities; for instance, a single text in the eld of medicine might be categorized
into different readability levels by a physician or a rst-year medical student. For a long
time, researchers have made efforts to nd a way to estimate the readability and suitability
of a text for different groups of audiences. As a result, different viewpoints regarding the
meaning of readability have emerged. While Harris and Hodges (1981) took readability as
EL
36,3
430
Received 9 January 2017
Revised 6 April 2017
4 July 2017
Accepted 21 August 2017
The Electronic Library
Vol. 36 No. 3, 2018
pp. 430-444
© Emerald Publishing Limited
0264-0473
DOI 10.1108/EL-01-2017-0007
The current issue and full text archive of this journal is available on Emerald Insight at:
www.emeraldinsight.com/0264-0473.htm

To continue reading

Request your trial

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT