Automatic prediction of news intent for search queries. An exploration of contextual and temporal features
| Date | 01 October 2018 |
| Published date | 01 October 2018 |
| DOI | https://doi.org/10.1108/EL-06-2017-0134 |
| Pages | 938-958 |
| Author | Xiaojuan Zhang,Shuguang Han,Wei Lu |
Automatic prediction of news
intent for search queries
An exploration of contextual and
temporal features
Xiaojuan Zhang
Department of Computer and Information Science, Southwest University,
Chongqing, China
Shuguang Han
Department of Information Science, University of Pittsburgh, Pittsburgh,
PA, USA, and
Wei Lu
Department of Information Management, Wuhan University, Hubei, China
Abstract
Purpose –The purpose of this paper is to predict news intent by exploring contextual and temporal
featuresdirectly mined from a general search engine query log.
Design/methodology/approach –First, a ground-truth data setwith correctly marked news and non-
news queries was built. Second, a detailed analysisof the search goals and topics distribution of news/non-
news queries was conducted. Third, three news features, that is, the relationship between entity and
contextual words extended fromquery sessions, topical similarity among clicked results and temporal burst
point were obtained. Finally, to understand the utilities of the new features and prior features, extensive
predictionexperiments on SogouQ (a Chinese search engine query log) were conducted.
Findings –News intentcan be predicted with high accuracy by using the proposed contextualand temporal
features, and the macro average F1 of classificationis around 0.8677. Contextual features are more effective
than temporal features. All the three new features are useful and significant in improving the accuracy of
news intentprediction.
Originality/value –This paper providesa new and different perspective inrecognizing queries with news
intent without use of such large corpora as social media (e.g.Wikipedia, Twitter and blogs) and news data
sets. The research will be helpful for general-purpose search engines to address search intents for news
events. In addition,the authors believe that the approaches described here in this paper are generalenough to
apply to otherverticals with dynamic content and interest, suchas blog or financial data.
Keywords Query classification, News intent, News queries, Query intent
Paper type Research paper
Introduction
It has been reported that around 10 per cent of web search queries are related to current
news events (Bar-Ilan et al.,2009). To better support the news information needs, modern
search engines such asGoogle and Bing have aggregated the news contents into their search
This research is supported by National Social Science Foundation of China under grant no.
15CTQ019 and National Nature Science Foundation of China under grant no. 71173164.
EL
36,5
938
Received26 June 2017
Revised27 November 2017
31January 2018
20April 2018
Accepted10 June 2018
TheElectronic Library
Vol.36 No. 5, 2018
pp. 938-958
© Emerald Publishing Limited
0264-0473
DOI 10.1108/EL-06-2017-0134
The current issue and full text archive of this journal is available on Emerald Insight at:
www.emeraldinsight.com/0264-0473.htm
results. News results occupy the space of “regular”search results so that the
misunderstanding of news intentfor search queries would hurt users’search experiences. It
is therefore important to predictthe news intent of search queries which provide guidance to
a general search engine in deciding whether to aggregate news content into the search
results or only providenon-news results. Note that, we name the queries with news intentas
news queries, otherwise non-newsqueries. However, automatic prediction of news intent of
search queries is a challenge because of at least two reasons. First, limited amount of
information can be directly derivedfrom both the query strings and the users who issue the
queries. Second, the prediction should ideally be in near real time. The most common
method is to treat it as a classification problem, where a list of features was extracted and
selected for better prediction accuracy. Consequently, feature extractionplays an important
role in news intent prediction. Existing approaches have tried to extract the classification
features from social media (e.g. Wikipedia,Twitter and blogs) or news data sets, and at least
two of these data resources are used each time.Although being able to produce a reasonable
prediction performance, it requires too many computation resources for crawling, parsing
and integrating these resources and what is especially important is that it is hard to get
these data resources. Hence, we would like to explore new feature(s) that can be quickly
computed. Users’query logs and their behaviorsonsearch engine result pages can provide
useful resources to extract such features,guiding us to extract features directly from search
engines. What is more, we believe that the news intent will be classified more precisely
because user logs from the same search engine under the same search scenario are
supported, and some hints forhis or her intent at that time will be obtained by analyzing the
user’s query click record and query reformulating history. To justify our hypothesis, we
conducted a series of experiments on real-worldsearch logs, and particularly, we focused on
the Chinese search enginesbecause of the availability of query logs.
News intent of a search query means that the query is related to a currently newsworthy
event, and newsintent prediction is to predict whetheror not a query is related to an ongoing
or recent event(Louis et al.,2011).Here, the event refers to “something happeningin a specific
place at a specific time, and tagged with a specificterm”(Ruocco and Ramampiaro, 2012).
Thus, if a query has thenews intent, then this query may be related to certain factors of an
event, such as named entities (e.g. person, place and organization), topical words (specific
terms assistingin describing the topic ofan event) and temporal information. We summarize
these factorsinto two types, contextual-based and temporal-based features.The former one is
based on the querystrings and textual context(e.g. the named entities containedin the query
string and the clicked results) in which the query keywords occur. The latter one tracks the
corpus frequency (the number of times queries occur in the query logs over time) and
quantifies the temporal distribution of clicked results. Moreover, we are interested in
studyingthe effectiveness of utilizingthese features for news intentprediction.
In summary, we address the issue of news intent prediction by proposing proper
methods to extract a different set of contextualand temporal features directly from a general
search engine query log without use of such large corpora as social media (e.g. Wikipedia,
Twitter and blogs) and news datasets in this paper. Importantly, the main contributions of
this paper lie in the followingthree aspects:
(1)We build a ground-truth data set with correctly marked news and non-news
queries, and this data set can be reused in similar tasks. And we conduct a detailed
analysis of the search topic distribution of news/non-news queries, which has not
been thoroughly studied.
(2)We propose two new contextual features and one temporal feature for news intent
classification, including the co-occurrence between a named entity (we use the
Automatic
prediction of
news intent
939
Get this document and AI-powered insights with a free trial of vLex and Vincent AI
Get Started for FreeStart Your Free Trial of vLex and Vincent AI, Your Precision-Engineered Legal Assistant
-
Access comprehensive legal content with no limitations across vLex's unparalleled global legal database
-
Build stronger arguments with verified citations and CERT citator that tracks case history and precedential strength
-
Transform your legal research from hours to minutes with Vincent AI's intelligent search and analysis capabilities
-
Elevate your practice by focusing your expertise where it matters most while Vincent handles the heavy lifting
Start Your Free Trial of vLex and Vincent AI, Your Precision-Engineered Legal Assistant
-
Access comprehensive legal content with no limitations across vLex's unparalleled global legal database
-
Build stronger arguments with verified citations and CERT citator that tracks case history and precedential strength
-
Transform your legal research from hours to minutes with Vincent AI's intelligent search and analysis capabilities
-
Elevate your practice by focusing your expertise where it matters most while Vincent handles the heavy lifting
Start Your Free Trial of vLex and Vincent AI, Your Precision-Engineered Legal Assistant
-
Access comprehensive legal content with no limitations across vLex's unparalleled global legal database
-
Build stronger arguments with verified citations and CERT citator that tracks case history and precedential strength
-
Transform your legal research from hours to minutes with Vincent AI's intelligent search and analysis capabilities
-
Elevate your practice by focusing your expertise where it matters most while Vincent handles the heavy lifting
Start Your Free Trial of vLex and Vincent AI, Your Precision-Engineered Legal Assistant
-
Access comprehensive legal content with no limitations across vLex's unparalleled global legal database
-
Build stronger arguments with verified citations and CERT citator that tracks case history and precedential strength
-
Transform your legal research from hours to minutes with Vincent AI's intelligent search and analysis capabilities
-
Elevate your practice by focusing your expertise where it matters most while Vincent handles the heavy lifting
Start Your Free Trial of vLex and Vincent AI, Your Precision-Engineered Legal Assistant
-
Access comprehensive legal content with no limitations across vLex's unparalleled global legal database
-
Build stronger arguments with verified citations and CERT citator that tracks case history and precedential strength
-
Transform your legal research from hours to minutes with Vincent AI's intelligent search and analysis capabilities
-
Elevate your practice by focusing your expertise where it matters most while Vincent handles the heavy lifting