Domain-specific readability measures to improve information retrieval in the Persian language
Date | 04 June 2018 |
Published date | 04 June 2018 |
DOI | https://doi.org/10.1108/EL-01-2017-0007 |
Pages | 430-444 |
Author | Sholeh Arastoopoor |
Subject Matter | Information & knowledge management,Information & communications technology,Internet |
Domain-specic readability
measures to improve information
retrieval in the Persian language
Sholeh Arastoopoor
Department of Information Science and Knowledge Studies,
Ferdowsi University of Mashhad, Mashhad, Iran
Abstract
Purpose – The degree to which a text is considered readable depends on the capability of the reader. This
assumption puts different information retrieval systems at the risk of retrieving unreadable or hard-to-be-read
yet relevant documents for their users. This paper aims to examine the potential use of concept-based
readability measures along with classic measures for re-ranking search results in information retrieval
systems, specically in the Persian language.
Design/methodology/approach – Flesch–Dayani as a classic readability measure along with document
scope (DS) and document cohesion (DC) as domain-specic measures have been applied for scoring the
retrieved documents from Google (181 documents) and the RICeST database (215 documents) in the eld of
computer science and information technology (IT). The re-ranked result has been compared with the ranking
of potential users regarding their readability.
Findings – The results show that there is a difference among subcategories of the computer science and IT
eld according to their readability and understandability. This study also shows that it is possible to develop
a hybrid score based on DS and DC measures and, among all four applied scores in re-ranking the documents,
the re-ranked list of documents based on the DSDC score shows correlation with re-ranking of the participants
in both groups.
Practical implications – The ndings of this study would foster a new option in re-ranking search
results based on their difculty for experts and non-experts in different elds.
Originality/value – The ndings and the two-mode re-ranking model proposed in this paper along with
its primary focus on domain-specic readability in the Persian language would help Web search engines and
online databases in further rening the search results in pursuit of retrieving useful texts for users with
differing expertise.
Keywords Information retrieval, Document cohesion, Document scope, Flesch–Dayani formula,
Persian, Re-ranking search results, Readability scores
Paper type Research paper
Introduction
The degree to which a text is considered readable depends directly on the capabilities of the
reader (Badgett, 2010). If the text is too simple or too difcult, it actually delivers no new
information to its audience. This assumption puts different information retrieval systems at
the risk of retrieving unreadable or hard-to-be-read yet relevant documents for their users. In
other words, users with different specialties in different subject areas may not have similar
reading capabilities; for instance, a single text in the eld of medicine might be categorized
into different readability levels by a physician or a rst-year medical student. For a long
time, researchers have made efforts to nd a way to estimate the readability and suitability
of a text for different groups of audiences. As a result, different viewpoints regarding the
meaning of readability have emerged. While Harris and Hodges (1981) took readability as
EL
36,3
430
Received 9 January 2017
Revised 6 April 2017
4 July 2017
Accepted 21 August 2017
The Electronic Library
Vol. 36 No. 3, 2018
pp. 430-444
© Emerald Publishing Limited
0264-0473
DOI 10.1108/EL-01-2017-0007
The current issue and full text archive of this journal is available on Emerald Insight at:
www.emeraldinsight.com/0264-0473.htm
To continue reading
Request your trial