Document-based approach to improve the accuracy of pairwise comparison in evaluating information retrieval systems

Published date20 July 2015
Date20 July 2015
DOIhttps://doi.org/10.1108/AJIM-12-2014-0171
Pages408-421
AuthorSri Devi Ravana,MASUMEH SADAT TAHERI,Prabha Rajagopal
Subject MatterLibrary & information science,Information behaviour & retrieval
Document-based approach to
improve the accuracy of pairwise
comparison in evaluating
information retrieval systems
Sri Devi Ravana, Masumeh Sadat Taheri and Prabha Rajagopal
Department of Information System, University of Malaya,
Kuala Lumpur, Malaysia
Abstract
Purpose The purposeof this paper is to propose a methodto have more accurate resultsin comparing
performanceof the paired information retrieval(IR) systems with reference tothe current method, which
is based on the mean effectiveness scores of the systems across a set ofidentified topics/queries.
Design/methodology/approach Based on the proposedapproach, instead of the classic methodof
using a set of topic scores, the documents level scores are considered as the evaluation unit. These
document scores are thedefined documents weight, which play the roleof the mean average precision
(MAP) score of the systems as a significance tests statics. The experiments were conducted using the
TREC 9 Web track collection.
Findings The p-values generated through the two types of significance tests, namely the Studentst-test
and Mann-Whitney show that by using the document level scores as an evaluation unit, the difference
between IR systems is more significant compared with utilizing topic scores.
Originality/value Utilizing a suitable test collection is a primary prerequisite for IR systems
comparative evaluation. However, in additionto reusable test collections, having an accurate statistical
testing is a necessity for these evaluations. The findings of this study will assist IR researchers to
evaluate their retrieval systems and algorithms more accurately.
Keywords Information retrieval, Document-based evaluation, Information retrieval evaluation,
Pairwise comparison, Significance test
Paper type Research paper
1. Introduction
To date, an overwhelming large number of evaluation approaches appraise the
accuracy and effectiveness of an information retrieval (IR) system by using Cranfield
paradigm, which is a system-based evaluation method. An overview of a system-based
experiment using a test collection is depicted in Figure 1. It is worthnoting that in large-
scale IR evaluation experimentation, the researchers use system-based evaluation
approaches instead of user-based evaluation method due to the high number of human
participants and retrieval systems required. Such requirements make the user-based
evaluation method time consuming and costly (Moghadasi et al., 2013). Besides, it would
require a controlled environment in running the experiments and a very carefully
designed experiment (Voorhees, 2002). The system-based evaluation method deploys
a test collection which includes a document corpus, a batch of predefined users
informationrequests, known as queries, and the relevancy judgments pointing out which
document is related to what topic (Carterette et al., 2006; Moghadasi et al., 2013).
Aslib Journal of Information
Management
Vol. 67 No. 4, 2015
pp. 408-421
©Emerald Group Publishing Limited
2050-3806
DOI 10.1108/AJIM-12-2014-0171
Received 10 December 2014
Revised 13 May 2015
Accepted 18 May 2015
The current issue and full text archive of this journal is available on Emerald Insight at:
www.emeraldinsight.com/2050-3806.htm
This research is supported by UMRG Program RP028E-14AET. Also, this work was supported
by the Exploratory Research Grant Scheme (ERGS) ER027-2013A.
408
AJIM
67,4

To continue reading

Request your trial

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT