The relative effects of knowledge, interest and confidence in assessing relevance

Published date31 July 2007
Pages482-504
Date31 July 2007
DOIhttps://doi.org/10.1108/00220410710758986
AuthorIan Ruthven,Mark Baillie,David Elsweiler
Subject MatterInformation & knowledge management,Library & information science
The relative effects of knowledge,
interest and confidence in
assessing relevance
Ian Ruthven, Mark Baillie and David Elsweiler
Department of Computer and Information Sciences,
University of Strathclyde, Glasgow, UK
Abstract
Purpose – The purpose of this paper is to examine how different aspects of an assessor’s context, in
particular their knowledge of a search topic, their interest in the search topic and their confidence in
assessing relevance for a topic, affect the relevance judgements made and the assessor’s ability to
predict which documents they will assess as being relevant.
Design/methodology/approach The study was conducted as part of the Text REtrieval
Conference (TREC) HARD track. Using a specially constructed questionnaire information was sought
on TREC assessors’ personal context and, using the TREC assessments gathered, the responses were
correlated to the questionnaire questions and the final relevance decisions.
Findings – This study found that each of the three factors (interest, knowledge and confidence) had
an affect on how many documents were assessed as relevant and the balance between how many
documents were marked as marginally or highly relevant. Also these factors are shown to affect an
assessors’ ability to predict what information they will finally mark as being relevant.
Research limitations/implications The major limitation is that the research is conducted within
the TREC initiative. This means that we can report on results but cannot report on discussions with
the assessors. The research implications are numerous but mainly on the effect of personal context on
the outcomes of a user study.
Practical implications – One major consequence is that we should take more account of how we
construct search tasks for IIR evaluation to create tasks that are interesting and relevant to
experimental subjects.
Originality/value – Examining different search variables within one study to compare the relative
effects on these variables on the search outcomes.
Keywords Informationretrieval,Information searches,Retrieval performanceevaluation,Search output,
Cognition, Information operations
Paper type Research paper
1. Introduction
Understanding how people assess relevance has been one of the core research areas in
information retrieval (IR) since its inception as an academic discipline. If we
understand more about the relevance decisions people make when searching the n we
can construct interactive systems that facilitate making these decisions, or systems
that make better predictions regarding user search behaviour. We can also better
understand how to evaluate the effectiveness of search systems to individual searchers
by understanding what searchers intend by an assessment of relevance.
In this paper we examine how an assessor’s interest in a search topic, their
knowledge about a search topic and their confidence in assessing relevance can change
the way in which the assessor assesses relevance. We do this with specific reference to
The current issue and full text archive of this journal is available at
www.emeraldinsight.com/0022-0418.htm
JDOC
63,4
482
Received 13 January 2006
Revised 13 June 2006
Accepted 24 June 2006
Journal of Documentation
Vol. 63 No. 4, 2007
pp. 482-504
qEmerald Group Publishing Limited
0022-0418
DOI 10.1108/00220410710758986
an experimental study we carried out as part of our participation in this year’s HARD
track of the Text REtrieval Conference (TREC) (Allan, 2005; Voorhees and Buckland,
2006). What our study shows is that an assessor’s personal context, that is their
knowledge of and attitudes to a search task, affect how they assess relevance. Our
results indicate that assessors with high knowledge regarding a search topic, high
interest in the search topic or high confidence in assessing documents will assess more
documents as being relevant to a search than assessors with lower topical knowledge,
interest or confidence. We also present results on how these personal factors affect an
assessor’s ability to predict what information they might find useful. Finally, we
consider which of these factors it is useful to know about an assessor and discuss the
implications of our results for the design and evaluation of interactive information
retrieval systems
The remainder of the paper is structured as follows: in section 3 we give a short
introduction to the HARD track as background to our research, in section 4 we present
the details of our study and in section 5 we present our findings. Prior to this, in section
2, we describe previous work on relevance assessment behaviour.
2. Related work
Relevance is the core concept in IR (Borlund, 2003; Mizzaro, 1997; Ruthven, 2005).
Researchers have investigated how people assess the relevance of documents either for
the purpose of understanding human search behaviour or for the purpose of improving
algorithms, such as relevance feedback algorithms, that utilise human assessments of
relevance. Assessing relevance would appear to be a simple process: either a document
is relevant or not to a searcher’s task or information need. However, as Katter (1968,
p. 1) noted, very early on, a “recurring finding from studies involving relevance
judgments is that the inter- and intra-judge reliability of relevance judgments is not
very high”. That is, the same person may judge the same document relevant or
non-relevant at different points in time, even within a single search session. Different
searchers may also disagree on the relevance of a document to a search request.
Relevance would then seem to be a very ephemeral concept upon which to base a
research discipline. However, the more we examine how people assess relevance the
more we can understand why this inconsistency in judging relevance occurs and how it
reflects the subjective and contextual nature of the assessment process.
Inconsistency in relevance judgements can occur for a number of reasons, for
example:
.The measurement of relevance is affected by how we measure relevance.To
investigate relevance, we need to be able to measure it in some way and a
standard practice is to ask searchers to estimate the relevance of a document
using some substitute representation, e.g. ordinal scales, relevance categories or,
most commonly, using a simple binary relevant/not relevant decision (Mizza ro,
1999). How we ask searchers to record the relevance of individual items can
affect our understanding of the searchers’ thought processes and evaluation
measures that use relevance. For example, Eisenberg and Hu (1987)
demonstrated that binary relevance decisions, asserting that a document is
relevant or not-relevant, can distort measures such as recall and precision
because the point at which individual searchers distinguish between relevant
and non-relevant is not consistent. That is, searchers may mark the same
Assessing
relevance
483

To continue reading

Request your trial

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT