Topic‐sensitive search engine evaluation

Document

Cited in

DOI	https://doi.org/10.1108/14684521111193184
Pages	893-908
Date	29 November 2011
Published date	29 November 2011
Author	Na Dai,Brian D. Davison
Subject Matter	Information & knowledge management,Library & information science

Topic-sensitive search engine

evaluation

Na Dai and Brian D. Davison

Department of Computer Science and Engineering, Lehigh University,

Bethlehem, Pennsylvania, USA

Abstract

Purpose – This work aims to investigate the sensitivity of ranking performance with respect to the

topic distribution of queries selected for ranking evaluation.

Design/methodology/approach – The authors reweight queries used in two TREC tasks to make

them match three real background topic distributions, and show that the performance rankings of

retrieval systems are quite different.

Findings – It is found that search engines tend to perform similarly on queries about the same topic;

and search engine performance is sensitive to the topic distribution of queries used in evaluation.

Originality/value – Using experiments with multiple real-world query logs, the paper demonstrates

weaknesses in the current evaluation model of retrieval systems.

Keywords Search engines,Query stream, Query classiﬁcation,Topic distribution,Ranking evaluation,

Function evaluation,Information retrieval, Functional analysis

Paper type Research paper

Introduction

As the world wide web has grown in size and popularity, so has the use of search

services to ﬁnd (and re-ﬁnd) information on the web. Logs of queries submitted to

search engines provide signiﬁcant information for search engine maintainers,

designers, and researchers in information retrieval. The activities recorded help

provide feedback on feature usage (Spink et al., 1998, 1999; Spink and Jansen, 2004),

estimates of searcher satisfaction and prediction of future click activity (Piwowarski

and Zaragoza, 2007), training for ranking improvement ( Joachims, 2002; Joachims et al.,

2005; Agichtein et al., 2006), and patterns of query reformulation (Bruza and Dennis,

1997; Spink et al., 2000; Joachims et al., 2007).

The contents of such queries also provide signiﬁcant information about user

interests, express users’ information needs, and represent what users hope or expect to

ﬁnd on the web. Therefore, understanding search query properties implicitly helps

build an objective ranking evaluation system that can help direct what a search service

should improve in order to enrich users’ search experience. For example the popularity

of queries related to “pumpkins” or “trick or treat”, etc., increases signiﬁcantly around

The current issue and full text archive of this journal is available at

www.emeraldinsight.com/1468-4527.htm

This material is based on work supported in part by the National Science Foundation under

grant numbers IIS-032885 and IIS-0545875, and by Microsoft (through its “Accelerating search”

programme). The authors particularly thank Microsoft for providing access to the query logs

and corresponding result sets. In addition they thank Xiaoguang Qi for his code and assistance in

snippet classiﬁcation. The authors also thank the anonymous reviewers for their useful

comments.

Topic-sensitive

search engine

evaluation

893

Received 15 September 2010

Accepted 20 June 2011

Online Information Review

Vol. 35 No. 6, 2011

pp. 893-908

qEmerald Group Publishing Limited

1468-4527

DOI 10.1108/14684521111193184

Halloween, and therefore ranking improvements for those queries have a greater

inﬂuence on search engine performance evaluation during that time.

This paper is motivated by the idea that queries selected for ranking evaluation

should maximally represent the characteristics of query logs so that the overall

performance of prospective systems can be evaluated objectively. In this work we

focus on the topic distribution of the query sample used for ranking evaluation, where

the topic distribution denotes the topics of queries belonging to that sample in

aggregate. We argue that this is an important characteristic to inﬂuence ranking

objectivity because the distribution of topics represented in query logs portrays users’

interests as query logs record the history of users’ behaviour.

Here we investigate the sensitivity of ranking performance with respect to the topic

distribution of queries selected for ranking evaluation. Speciﬁcally we demonstrate

that topical representativeness can be a signiﬁcant factor inﬂuencing the objectivity of

search engine performance evaluation by showing evidence that search engines tend to

demonstrate more similar ranking performance on queries within the same topic.

Perhaps more importantly we demonstrate that the query sets selected for standard

retrieval evaluation in TREC (NIST, 2011) fail to match several real-world search logs

in terms of their topic distribution and thus rank retrieval systems differently from

how the systems will likely perform under a real-world query stream.

In the remainder of this paper we provide additional background and related work,

introduce our dataset, and present our experimental results. We conclude with a

discussion of the value and limitations of our ﬁndings and a summary of our results.

Related work

In this section we review background material and prior work.

Automated query classiﬁcation

To provide the right kind of search results, it is often important to know (or estimate)

the intent of the user. For example whether the user has a navigational interest or an

informational need (Broder, 2002; Rose and Levinson, 2004) can affect which

algorithms are most useful. As a result there is signiﬁcant interest in automatic intent

classiﬁcation (Kang and Kim, 2003; Lee et al., 2005; Jansen et al., 2007).

In general query classiﬁcation of almost any kind is known to be difﬁcult, primarily

because of the short and often ambiguous queries generated by searchers. However

some methods have been successful for query topic classiﬁcation, e.g. utilising

additional unlabeled data (Taksa et al., 2007; Beitzel, Jensen, Lewis, Chowdhury and

Frieder, 2007) and bridging topic hierarchies to enable training on larger datasets (Li

et al., 2005; Vogel et al., 2005; Shen et al., 2006a). As a result query topic classiﬁcation

can be useful in many tasks, including:

.phrase suggestion based on query topic (Jensen et al., 2006 );

.web search personalisation (Liu et al., 2002);

.recognition of search multitasking, i.e. to watch for transitions in topics within

sessions, as in Ozmutlu et al. (2006);

.monetisation of search through relevant advertising (Broder et al., 2007); and

.the understanding and analysis of searcher topics of interest ( Jansen and Spink,

2006).

OIR

35,6

894

To continue reading

Request your trial

Subscribers can access the reported version of this case.

You can sign up for a trial and make the most of our service including these benefits.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Subscribers are able to see a list of all the cited cases and legislation of a document.

You can sign up for a trial and make the most of our service including these benefits.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Subscribers are able to see a list of all the documents that have cited the case.

You can sign up for a trial and make the most of our service including these benefits.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Subscribers are able to see the revised versions of legislation with amendments.

You can sign up for a trial and make the most of our service including these benefits.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Subscribers are able to see any amendments made to the case.

You can sign up for a trial and make the most of our service including these benefits.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Subscribers are able to see a visualisation of a case and its relationships to other cases. An alternative to lists of cases, the Precedent Map makes it easier to establish which ones may be of most relevance to your research and prioritise further reading. You also get a useful overview of how the case was received.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Subscribers are able to see the list of results connected to your document through the topics and citations Vincent found.

You can sign up for a trial and make the most of our service including these benefits.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Topic‐sensitive search engine evaluation

You can sign up for a trial and make the most of our service including these benefits.

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users

You can sign up for a trial and make the most of our service including these benefits.

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users

You can sign up for a trial and make the most of our service including these benefits.

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users

You can sign up for a trial and make the most of our service including these benefits.

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users

You can sign up for a trial and make the most of our service including these benefits.

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users

You can sign up for a trial and make the most of our service including these benefits.

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users