A framework for designing retrieval effectiveness studies of library information systems using human relevance assessments

Pages509-527
Date08 May 2017
DOIhttps://doi.org/10.1108/JD-08-2016-0099
Published date08 May 2017
AuthorChristiane Behnert,Dirk Lewandowski
Subject MatterLibrary & information science,Records management & preservation,Document management,Classification & cataloguing,Information behaviour & retrieval,Collection building & management,Scholarly communications/publishing,Information & knowledge management,Information management & governance,Information management,Information & communications technology,Internet
A framework for designing
retrieval effectiveness studies of
library information systems using
human relevance assessments
Christiane Behnert and Dirk Lewandowski
Department of Information, Hamburg University of Applied Sciences,
Hamburg, Germany
Abstract
Purpose The purpose of this paper is to demonstrate how to apply traditional information retrieval (IR)
evaluation methods based on standards from the Text REtrieval Conference and web search evaluation
to all types of modern library information systems (LISs) including online public access catalogues,
discovery systems, and digital libraries that provide web search features to gather information from
heterogeneous sources.
Design/methodology/approach The authors apply conventional procedures from IR evaluation to the
LIS context considering the specific characteristics of modern library materials.
Findings The authors introduce a framework consisting of five parts: search queries, search results,
assessors, testing, and data analysis. The authors show how to deal with comparability problems resulting
from diverse document types, e.g., electronic articles vs printed monographs and what issues need to be
considered for retrieval tests in the library context.
Practical implications The framework can be used as a guideline for conducting retrieval effectiveness
studies in the library context.
Originality/value Although a considerable amount of research has been done on IR evaluation, and
standards for conducting retrieval effectiveness studies do exist, to the authorsknowledge this is the first
attempt to provide a systematic framework for evaluating the retrieval effectiveness of twenty-first-century
LISs. The authors demonstrate which issues must be considered and what decisions must be made by
researchers prior to a retrieval test.
Keywords Evaluation, Information retrieval, Library systems, Digital libraries, Information systems,
Online catalogues, Discovery systems, Relevance assessments, Search results
Paper type Research paper
Introduction
When it comes to information seeking, users expect modern library information systems
(LISs) to look and function more like search engines(Connaway and Dickey, 2010, p. 5).
Typical user information behaviour is characterized by a strong preference for the first
results in the ranked search results list, short queries of only two to three terms, and users
relying on default search settings. This can be observed in both web search (Barry and
Lardner, 2011; Jansen and Spink, 2006; Pan et al., 2007; Zhang et al., 2009) and library search
(Antelman et al., 2006; Asher et al., 2013; Hennies and Dressler, 2006).
When libraries began integrating search engine technology into their catalogues, they
moved away from the Boolean searching paradigm (exact matchapproach) towards
ranking search results by relevance (best matchapproach) and a combined search
interface allowing immediate access to multiple, heterogeneous information sources
(Lewandowski, 2010). These so-called discovery systems are designed with the user in mind.
They also seek to provide Google-like advanced search and retrieval functionality that goes
Journal of Documentation
Vol. 73 No. 3, 2017
pp. 509-527
© Emerald PublishingLimited
0022-0418
DOI 10.1108/JD-08-2016-0099
Received 2 August 2016
Revised 14 December 2016
Accepted 15 December 2016
The current issue and full text archive of this journal is available on Emerald Insight at:
www.emeraldinsight.com/0022-0418.htm
The framework presented in this paper has been developed as part of the research project LibRank New
Approaches to Relevance Ranking in Library Information Systems funded by the German Research
Foundation (DFG Deutsche Forschungsgemeinschaft) fromMarch 2014 until February 2016.
509
LISs using
human
relevance
assessments
beyond metasearch systems that were introduced as a first step to enable unified
search on multiple sources, but do not cover the whole spectrum of library resources
(Sadeh, 2015, p. 214). For example, discovery systems provide simple keyword search and
ranked results lists based on relevance criteria as well as user-friendly search features like
auto-completion and spell-checking for search queries (Chickering and Yang, 2014).
However, whether these features contribute successfully to satisfying the information needs
of users in a library context has not yet been extensively researched. Most of the studies that
investigate discovery systems focus on the process of evaluating and selecting discovery
software, determining whether certain criteria are fulfilled, or which features are available in
the evaluated tool (Moore and Greene, 2012). This common procedure was also noted by
Hofmann and Yang (2012) who studied 260 academic library catalogues in the USA and
Canada regarding the implementation of discovery layers or features. They found that the
number of libraries offering discovery tools had increased within two years from 16 to
29 per cent (Hofmann and Yang, 2012).
Despite the fact that discovery systems are being implemented by libraries, one must ask
how well they perform in terms of search and retrieval. These issues have been evaluated to
a lesser extent. Studies on the retrieval effectiveness of a particular information system
typically involve assessing the relevance of documents for a search query or information
need. Relevance assessments are commonly used for information retrieval (IR) evaluation,
particularly within Text REtrieval Conference (TREC), whereas relevance as a
fundamental concernfor OPAC evaluation had been ascertained decades ago
(OBrien, 1990). Although there are some studies on testing the retrieval effectiveness of
OPACs or discovery systems, there are no consistent criteria for evaluation, as they are met
within TREC.
In this paper, we introduce a framework for designing retrieval effectiveness studies of
modern LISs[1] that is based on the standard methods for IR evaluation. The framework
specifically focusses on the different data types from multiple, heterogeneous sources
which need to be integrated into a single results list, ranging from short surrogates to full
texts of monographs.
The rest of this paper is structured as follows. First, we provide an overview of
conventional procedures and their application to web search followed by evaluation
studies of LISs. After describing our methods, we present the five-part framework and
discuss decisions that need to be made with regard to search queries, search results,
assessors, testing procedures, and data analysis. We conclude with practical implications
and acknowledge the limitations of the framework.
Literature review
In this section, we review the literature on standard methods of IR evaluation and their
application to web search evaluation; on approaches taken to evaluate LISs; and on
empirical findings from studies concerning these systems.
TREC and other standards
In the late 1950s, Cleverdon (1960) and his colleagues laid the foundation for systematic IR
evaluation based on a formal framework. They conducted a pioneering study on factors
determining the performance of indexing systemsin an experimental environment
(Cleverdon and Keen, 1966; Cleverdon et al., 1966a, b). In what became known as Cranfield
experiments, an inverse relationship was observed between recall and precision, which
remain the standard concepts behind performance measurement for IR evaluation today.
The retrieval tests within the TREC[2] started in 1992 and followed the Cranfield paradigm,
i.e., using test collections including a set of documents and search queries to evaluate
510
JD
73,3

To continue reading

Request your trial

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT