Evaluation of Google question-answering quality

DOIhttps://doi.org/10.1108/LHT-10-2017-0218
Date17 June 2019
Published date17 June 2019
Pages312-328
AuthorYiming Zhao,Jin Zhang,Xue Xia,Taowen Le
Subject MatterLibrary & information science,Librarianship/library management,Library technology,Information behaviour & retrieval,Information user studies,Metadata,Information & knowledge management,Information & communications technology,Internet
Evaluation of Google
question-answering quality
Yiming Zhao
Center for Studies of Information Resources, Wuhan University, Wuhan, China
Jin Zhang
School of Information Studies, University of Wisconsin Milwaukee,
Milwaukee, Wisconsin, USA
Xue Xia
School of Computer Science, Carnegie Mellon University,
Pittsburgh, Pennsylvania, USA, and
Taowen Le
Goddard School of Business and Economics,
Weber State University, Ogden, Utah, USA
Abstract
Purpose The purpose of this paper is to evaluate Google question-answering (QA) quality.
Design/methodology/approach Given the large variety and complexity of Google answer boxes in
search result pages, existing evaluation criteria for both search engines and QA systems seemed unsuitable.
This study developed an evaluation criteria system for the evaluation of Google QA quality by coding and
analyzing search results of questions from a representative question set. The study then evaluated Googles
overall QA quality as well as QA quality across four target types and across six question types, using the
newly developed criteria system. ANOVA and Tukey tests were used to compare QA quality among different
target types and question types.
Findings It was found that Google provided significantly higher-quality answers to person-related
questions than to thing-related, event-related and organization-related questions. Google also provided
significantly higher-quality answers to where- questions than to who-, what- and how-questions. The more
specific a question is, the higher the QA quality would be.
Research limitations/implications Suggestionsfor both search engineusers and designers arepresented
to help enhance user experience and QA quality.
Originality/value Particularly suitable for search engine QA quality analysis, the newly developed
evaluationcriteria system expanded andenriched assessment metrics ofboth search engines and QA systems.
Keywords Google, Search engine, Evaluation criteria, Question-answering, Question type, Target type
Paper type Research paper
Introduction
Internet searching has become very important in peoplesdaily lives (Lewandowski and
Höchstötter, 2008; Palanisamy, 2013). According to Alexa (2017), Google is the most
popular internet reference site in 2016. Statistics show that around two trillion Google
searches happened annually, with 2.3m Google searches launched per second on average
(Google, 2017).
In recent years, web search engines transformed from keyword-based retrieval systems
to natural question-answering (QA) systems (Etzioni, 2011). Google started to provide QA
services in 2012 (Singhal, 2012). With QA service, when a user submits a question to Google
Library Hi Tech
Vol. 37 No. 2, 2019
pp. 312-328
© Emerald PublishingLimited
0737-8831
DOI 10.1108/LHT-10-2017-0218
Received 30 October 2017
Revised 13 April 2018
26 June 2018
Accepted 27 June 2018
The current issue and full text archive of this journal is available on Emerald Insight at:
www.emeraldinsight.com/0737-8831.htm
This work is supported by the National Natural Science Foundation of China under Grant Nos
71420107026, 71874130 and 71403190. It is partly supported by the Ministry of Education of the Peoples
Republic of China under Grant No. 18YJC870026, China Association for Science and ;Technology, the
Fundamental Research Funds for the Central Universities and Research Fund for Academic Team of
Young Scholars at Wuhan University (Whu2016013).
312
LHT
37,2
search, Google would show an answer box to directly answer the question. The content in
the answer box might be an exact answer generated from Google Knowledge Graph or a
featured snippet (summary) extracted automatically from a webpage.
The answer box is displayed at the top of the search engine result pages (SERPs)
suggesting its importance in SERPs and its value for research because users tend to read the
result pages from top to bottom (Lewandowski and Höchstötter, 2008). As an innovation
leading to the next generation of search engines, Googles QA feature urgently needs to be
assessed and optimized (Lopez et al., 2013; Singhal, 2012).
Existing research on web search engine evaluation seems to have focused on retrieval
effectiveness (quality of search results), search features and user satisfaction (Lewandowski
and Höchstötter,2008; Lewandowski, 2015). Althougha few tested Googles natural-languag e
query interface based on a narrow question set (such as geographical queries), to our
knowledge, there has not been an evaluation of QA quality of general web search engines
based on a standard and popular question set. Additionally, answers to natural-language
questions are generally presented in answer boxes in multiple and complex ways, making
traditional evaluation criteria unsuitable for QA systems.
Treating Google as a QA system rather than a typical keyword-based web search engine,
this study evaluates Googles QA quality. The primary purposes of this study are to develop
a suitable evaluation system for search engine QA quality assessment, evaluate Googles
QA quality using that system and present suggestions and recommendations to search
engine designers and users.
The findings of this study can help information professionals better understand Googles
QA feature and its characteristics, help users know what question types would most likely
generate satisfactory answers and help search engine providers improve their QA services.
Related studies
While many studies evaluate search engines and automatic QA systems separately, this
study combines measurements from both scenarios. Hence, literature on web search engine
evaluation, QA system evaluation and search engine QA feature development are reviewed.
Evaluation of web search engines
Much work has been done on search engine evaluation and comparisons. Criteria used
typically relate to four major areas including index quality, retrieval effectiveness, search
features and user satisfaction (Lewandowski, 2008).
Index quality of a search engine depends on the coverage and up-to-datedness of its
index databases which are usually established by crawling documents from the World Wide
Web (Lewandowski and Höchstötter, 2008). Search engines such as Google not only rely on
crawling documents of the web, but also utilize structured knowledge in knowledge bases
such as Google Knowledge and Freebase (Singhal, 2012).
Retrieval effectiveness refers to the systems capability to retrieve relevant information
items, specifically the number and relevance of returned results (Lewandowski, 2008). The
two most frequent and basic measures for information retrieval effectiveness are precision
and relative recall, which have been often used in former studies on search engine
evaluation (Oppenheim et al., 2000; Su, 2003). A good summary of studies of retrieval
effectiveness was done by Lewandowski (2015). Most of retrieval effectiveness evaluations
rely on binary relevance decisions (relevant vs non-relevant) or multi-level discrete
relevance judgments (e.g. relevant, mostly relevant, partially relevant and non-relevant)
made by assessors (Landoni and Bell, 2000; Lewandowski, 2015).
Most web search engines provide different types of search features. A comparative study
of the search feature effectiveness was conducted by Zhang et al. (2013) in which common
features such as title search, basic search, exact phrase search, PDF search and URL search
313
Evaluation of
Google QA
quality

To continue reading

Request your trial

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT