A library’s information retrieval system (In)effectiveness: case study

DOIhttps://doi.org/10.1108/LHT-07-2015-0071
Pages369-386
Date21 September 2015
Published date21 September 2015
AuthorRobert Marijan,Robert Leskovar
Subject MatterLibrary & information science,Librarianship/library management,Library technology
A librarys information retrieval
system (In)effectiveness:
case study
Robert Marijan
Delo Newspaper Corporation, Ljubljana, Slovenia, and
Robert Leskovar
Faculty of Organizational Sciences, University of Maribor, Maribor, Slovenia
Abstract
Purpose The purpose of this paper is to evaluate the effectiveness of the information retrieval
component of a daily newspaper publishers integrated library system (ILS) in comparison with the
open source alternatives and observe the impact of the scale of metadata, generated daily by library
administrators, on retrieved result sets.
Design/methodology/approach In Experiment 1, the authors compared the result sets of the
information retrieval system (IRS) component of the publishers current ILS and the result sets of
proposed ones with human-assessed relevance judgment set. In Experiment 2, the authors compared
the performance of proposed IRS components with the publishers current production IRS, using result
sets of current IRS classified as relevant. Both experiments were conducted using standard information
retrieval (IR) evaluation methods: precision, recall, precision at k,F-measure, mean average precision
and 11-point interpolated average precision.
Findings Results showed that: first, in Experiment 1, the publishers current production ILS ranked
last of all participating IRSs when compared to a relevance document set classified by the senior
library administrator; and second, in Experiment 2, the tested IR componentsrequest handlers that
used only automatically generated metadata performed slightly better than request handlers that used
all of the metadata fields. Therefore, regarding the effectiveness of IR, the daily human effort of
generating the publishers current set of metadata attributes is unjustified.
Research limitations/implications The experimentscollections contained Slovene language
with large number of variations of the forms of nouns, verbs and adjectives. The results could be
different if the experimentscollections contained languages with different grammatical properties.
Practical implications The authors have confirmed, using standard IR methods, that the IR
component used in the publishers current ILS, could be adequately replaced with an open source
component. Based on the research, the publisher could incorporate the suggested open source IR
components in practice. In the research, the authors have described the methods that can be used by
libraries for evaluating the effectiveness of the IR of their ILSs.
Originality/value The paper provides a framework for the evaluation of an ILSs IR effectiveness
for libraries. Based on the evaluation results, the libraries could replace the IR components if their
current information system setup allows it.
Keywords Information retrieval, Precision, Open source software, Library systems, Recall,
Apache Solr
Paper type Research paper
Introduction
Innovation in assembly is one of the key Web 2.0 principles. The principle refers to an
abundance of commodity components (or pre-existing foundations), that one can use to
create value by assembling them in novel or effective ways (OReilly, 2005; Miller,
2006). Library 2.0 is a subset of library services designed to meet user needs caused by
the direct and peripheral effects of Web 2.0(Habib, 2006, p. 9). Openness of Library 2.0
extends to the software and hardware that libraries use, including integrated library
Library Hi Tech
Vol. 33 No. 3, 2015
pp. 369-386
©Emerald Group Publis hing Limited
0737-8831
DOI 10.1108/LHT-07-2015-0071
Received 12 February 2015
Revised 9 July 2015
Accepted 19 July 2015
The current issue and full text archive of this journal is available on Emerald Insight at:
www.emeraldinsight.com/0737-8831.htm
369
Librarys
information
retrieval
system
systems (ILS)(Casey and Savastinuk, 2006). Miller (2006) foresees the end of closed,
proprietary, monolithic library software systems and emphasizes the need to specify
and build modular systems from which libraries can select the best components for a
given task.The preference toward modifiable and open systems are also noted by
Casey and Savastinuk (2006), Nesta and Mi (2011).
Modularity is a special form of design which intentionally creates a high degree of
independence or loose couplingbetween component designs by standardizing
component interface specifications(Sanchez and Mahoney, 1996). If libraries used the
concepts of loose coupling(Weick, 1976) in the design and implementation of their
software systems, libraries coul d, when any element misfires or deca ys or
deteriorates,replace that element with the new one without affecting the operation
of other elements. A more recent concept, service-oriented architecture, is an
architecture for building business applications as a set of loosely coupled distributed
components linked together to deliver a well-defined level of service. These services
communicate with each other, and the communication involves data exchange or
service coordination(Wang and Dawes, 2013). In this research, we explore the concept
of loose couplingby evaluating the effectiveness of the information retrieval (IR)
component of a daily newspaper publishers ILS in comparison with the open source
alternatives, and observe the impact of the scale of metadata attributes, generated daily
by library administrators, on retrieved result sets.
We conducted two experiments. In Experiment 1, we compare the result sets of the
information retrieval system (IRS) component of the current ILS with the result sets of
proposed ones, using standard IR methods. Furthermore, we divide the current archive
metadata attribute set into two groups: first, the all-fields(AF) group containing all
metadata attributes; and second, the computed-fields(CF) group containing only
automatically generated metadata attributes. For Experiment 2, we compared the
performance of various Apache Solr cores relative to the current production IRS. Th e
base core group (AF) configuration included all of the fields (human generated
attributes and attributes, automatically generated upon transfer from the publishers
editorial systems to ILS). The second core group included only automatically generated
CF attributes.
IR
While Van Rijsbergen (1979) presented the clear difference between data retrieval and
IR, Manning et al. (2009, p. 1) defined the IR as finding material (usually documents) of
an unstructured nature (usually text) that satisfies an information need from within
large collections (usually stored on computers).
Herrera-Viedma (2001) defined the main activity of an IRS as the gathering of the
pertinent archived documents that best satisfy the user queries.This author parsed
the process of gathering into three components: (1) A Database: which stores the
documents and the representation of their information contents (index terms ), (2) A
Query Subsystem: which allows users to formulate their queries by means of a query
language and (3) An Evaluation Subsystem: which evaluates the documents for a user
query obtaining a Retrieval Status Value (RSV) for each document.
Pirkola (2001) presented a morphological classification of languages from the
standpoint of IR. He summarized morphology as a field of linguistics which studies
word structure and formation,and split morphology into inflectional morphology and
derivational morphology (Pirkola, 2001, p. 331 cited Karlsson, 1998; Bybee, 1985;
Matthews, 1991). Pirkola (2001) defined inflection as the use of morphological methods
370
LHT
33,3

To continue reading

Request your trial

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT