Mining user queries with information extraction methods and linked data

Published date10 September 2018
DOIhttps://doi.org/10.1108/JD-09-2017-0133
Pages936-950
Date10 September 2018
AuthorAnne Chardonnens,Ettore Rizza,Mathias Coeckelbergs,Seth van Hooland
Subject MatterLibrary & information science,Records management & preservation,Document management,Classification & cataloguing,Information behaviour & retrieval,Collection building & management,Scholarly communications/publishing,Information & knowledge management,Information management & governance,Information management,Information & communications technology,Internet
Mining user queries with
information extraction methods
and linked data
Anne Chardonnens
State Archives of Belgium/CegeSoma, Brussels, Belgium and
Information and Communication Science Department,
Université libre de Bruxelles (ULB), Brussels, Belgium, and
Ettore Rizza, Mathias Coeckelbergs and Seth van Hooland
Information and Communication Science Department,
Université libre de Bruxelles (ULB), Brussels, Belgium
Abstract
Purpose Advanced usage of web analytics tools allows to capture the content of user queries. Despite their
relevant nature, the manual analysis of large volumes of user queries is problematic. The purpose of this
paper is to address the problem of named entity recognition in digital library user queries.
Design/methodology/approach The paper presents a large-scale case study conducted at the Royal
Library of Belgium in its online historical newspapers platform BelgicaPress. The object of the study is a data
set of 83,854 queries resulting from 29,812 visits over a 12-month period. By making use of information
extraction methods, knowledge bases (KBs) and various authority files, this paper presents the possibilities
and limits to identify what percentage of end users are looking for person and place names.
Findings Based on a quantitative assessment, the method can successfully identify the majority of person
and place names from user queries. Due to the specific character of user queries and the nature of the KBs
used, a limited amount of queries remained too ambiguous to be treated in an automated manner.
Originality/value This paper demonstrates in an empirical manner how user queries can be extracted
from a web analytics tool and how named entities can then be mapped with KBs and authority files, in order
to facilitate automated analysis of their content. Methods and tools used are generalisable and can be reused
by other collection holders.
Keywords Digital libraries, Cultural heritage, Knowledge bases, Query classification, User query,
Web analytics
Paper type Case study
1. Introduction
Both policy makers and the public are increasingly regarding libraries, archives and
museums as content and service providers who operate in the same market as commercial
information providers. This situation is reflected in the adoption of the common definition of
the quality of information systems and services by ISO within the cultural heritage sector,
which focusses on the fitness for purpose(Boydens, 1999; ISO, 2005). This interpretation
of quality refers to the idea of self-regulating markets where demand directly influences
supply as consumers are empowered to decide what information is of use (Suominen, 2007).
Within this context, cultural heritage institutions have been making use of web analytics
tools to quantify the interaction between their collections and end users. The dashboards of
popular tools such as Google Analytics do provide useful features to understand how many
end users interact with a website, where they come from, how long they stay or with which
specific web pages they engage the most. As described by Kelly (2014), web analytics can
lead to developing enhancements to the architecture, metadata or content of a digital library
to improve the user experience. Still, beyond assessment tools helping to collect user
experience, additional tools for automating and analysing this data are still needed to make
it a widespread practice [for archives](Kelly, 2017). Thus, the Google Analytics approach
Journal of Documentation
Vol. 74 No. 5, 2018
pp. 936-950
© Emerald PublishingLimited
0022-0418
DOI 10.1108/JD-09-2017-0133
Received 22 September 2017
Revised 2 February 2018
Accepted 18 February 2018
The current issue and full text archive of this journal is available on Emerald Insight at:
www.emeraldinsight.com/0022-0418.htm
936
JD
74,5

To continue reading

Request your trial

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT