Improving the visibility of library resources via mapping library subject headings to Wikipedia articles

Date19 March 2018
Published date19 March 2018
DOIhttps://doi.org/10.1108/LHT-04-2017-0066
Pages57-74
AuthorArash Joorabchi,Abdulhussain E. Mahdi
Subject MatterLibrary & information science,Librarianship/library management,Library technology,Information behaviour & retrieval,Information user studies,Metadata,Information & knowledge management,Information & communications technology,Internet
Improving the visibility of
library resources via mapping
library subject headings to
Wikipedia articles
Arash Joorabchi and Abdulhussain E. Mahdi
Department of Electronic & Computer Engineering, University of Limerick,
Limerick, Ireland
Abstract
Purpose Linking libraries and Wikipedia can significantly improve the quality of services provided by
these two major silos of knowledge. Such linkage would enrich the quality of Wikipedia articles and at the
same time increase the visibility of library resources. To this end, the purpose of this paperis to describe the
design and development of a software system for automatic mapping of FAST subject headings, used to
index library materials, to their corresponding articles in Wikipedia.
Design/methodology/approach The proposed system works by first detecting all the candidate
Wikipedia concepts (articles) occurring in the titles of the books and other library materials which are indexed
with a given FAST subject heading. This is then followed by training and deploying a machine learning (ML)
algorithm designed to automatically identify those concepts that correspond to the FAST heading. In specific,
the ML algorithm used is a binary classifier which classifies the candidate concepts into either
correspondingor non-correspondingcategories. The classifier is trained to learn the characteristics of
those candidates which have the highest probability of belonging to the correspondingcategorybased on a
set of 14 positional, statistical, and semantic features.
Findings The authors have assessed the performance of the developed system using standard information
retrieval measures of precision, recall, and F-score on a data set containing 170 FAST subject headings
manually mapped to their corresponding Wikipedia articles. The evaluation results show that the developed
system is capable of achieving F-scores as high as 0.65 and 0.99 in the corresponding and non-corresponding
categories, respectively.
Research limitations/implications The size of the data set used to evaluate the performance of the
system is rather small. However, the authors believe that the developed data set is large enough to
demonstrate the feasibility and scalability of the proposed approach.
Practical implications The sheer size of English Wikipedia makes the manual mapping of Wikipedia
articles to library subject headings a very labor-intensive and time-consuming task. Therefore, the aim is to
reduce the cost of such mapping and integration.
Social implications The proposed mapping paves the way for connecting libraries and Wikipedia as two
major silos of knowledge, and enables the bi-directional movement of users between the two.
Originality/value To the best of the authorsknowledge, the current work is the first attempt at automatic
mapping of Wikipedia to a library-controlled vocabulary.
Keywords FAST subjectheadings, Controlledvocabularies, Wikipedia, Data integration, Library catalogues,
Semantic mapping
Paper type Research paper
1. Introduction
Library websites and online catalogues are experiencing a decline in their number of
visitors. This, in turn, could translate into a decrease in the number of students and other
information seekers who use library resources. According to De Rosa (2005), less than
1 percent of online information searches start from library websites, and the majority of the
rest of information-seeking activities (84 percent) start from search engines such as Google. Library Hi Tech
Vol. 36 No. 1, 2018
pp. 57-74
© Emerald PublishingLimited
0737-8831
DOI 10.1108/LHT-04-2017-0066
Received 6 April 2017
Revised 3 August 2017
Accepted 8 August 2017
The current issue and full text archive of this journal is available on Emerald Insight at:
www.emeraldinsight.com/0737-8831.htm
This work was supported by the OCLC/ALISE Library and Information Science Research Grant
Program (LISRGP) 2016.
57
Improving
the visibility
of library
resources
This wide-spread low-effort information-seeking behavior is known by the library and
information science scholars as the principle of least effort (Chang, 2016). According to this
principle, the main concern underlying the majority of information-seeking behaviors is the
desire to reduce the time and effort spent, as formalized by Zipfs(1949)law.
Subsequently,Google-Wikipedia is becominga prevalent online information-seeking route.
In this new trend, the information seeker submits an informational query (i.e. query on
a particular topic, subject, or concept) to Google and follows one of the search results to a
relevant articleon Wikipedia. Safran (2012) showedthat Wikipedia articles appear onpage 1
of Google search results for 60 percent of informational queries, and in 66 percent of
such cases, Wikipedia articles appearin top visibility positions (1-3)of the results page, where
the majority of clicks occur. A more recent case study by McMahon et al. (2017) on the
relationship between Wikipedia and Google demonstrates an extensive and mutually
beneficial interdependence between the two. In this study, Wikipedia links were silently
removed from the search results presented to the participants to examine the effect.
Reportedly,the quality of Google searchresults considerably degradesfor many queries when
links to Wikipedia content are excluded; the study also highlightsGoogles important role in
providing readership to Wikipedia.
Wikipedia has become the largest free encyclopedia online. The English Wikipedia
currently contains over five million articles. Wikipedia articles are written and edited by a
large community of volunteer contributors, editors, and administrators. Wikipedia serves an
important role in addressing public information needs. For example, results of a nationwide
survey conducted in the USA in 2007 showed that 36 percent of American internet users
look for information on Wikipedia; and Wikipedia attracted six times more traffic than the
next closest website in the educational and referencecategory, outperforming websites
such as Google Scholar and Google Books with a large margin (Rainie and Tancer, 2007).
This nationwide survey was repeated again in 2010 and showed that the rate of American
internet users who turn to Wikipedia for information has risen from 36 to 53 percent, and
Wikipedia is most popular with the 18-29 age group (Zickuhr and Rainie, 2011).
In the context described above, linking Wikipedia articles to the records of related library
materials would enable information seekers to readily acquire lists of library resources which
provide in-depth knowledge on their subject of interest. In this paradigm, each Wikipedia article
would be linked to the records of related materials in a global union catalogue of libraries around
the world, i.e. WorldCat.org. This, in turn, would provide bibliographic metadata on the
materials of interest and direct information seekers to their local libraries, where they can access
those materials. Availability of this new Wikipedia-library information-seeking paradigm would
consequently improve the visibility of library resources which are currently overlooked to a
large extend by those information consumers with lower information literacy skills.
Based on the above, mapping Wikipedia articles to their corresponding library subject
headings (i.e. LCSH, FAST) could play an important role toward Wikipedia-library
integration. In practice, such mapping would enable the bi-directional movement of users
between libraries and Wikipedia as two major silos of knowledge. However, the sheer size of
English Wikipedia (W5m articles) makes the manual mapping of Wikipedia articles to
library subject headings a very labor-intensive and time-consuming task. Therefore, our aim
is to reduce the cost of such mapping and integration. To this end, in this paper, we describe
the design and development of a new software system for automatic mapping of Wikipedia
articles to their corresponding FAST subject headings. There has been substantial research
carried out in relation to automating the process of subject indexing of library records and
electronic documents with traditional library-controlled vocabularies and classification
systems. Golub (2006) and Yi (2007) reviewed earlier works in this field carried out by
library organizations such as Library of Congress and Online Computer Library Center
(OCLC); and Wang (2009) and Khoo et al. (2015) reviewed more recent works in the context of
58
LHT
36,1

To continue reading

Request your trial

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT