Evaluating search features of Google Knowledge Graph and Bing Satori. Entity types, list searches and query interfaces

Pages197-213
Date13 April 2015
DOIhttps://doi.org/10.1108/OIR-10-2014-0257
Published date13 April 2015
AuthorAhmet Uyar,Farouk Musa Aliyu
Subject MatterLibrary & information science,Information behaviour & retrieval
Evaluating search features of
Google Knowledge Graph
and Bing Satori
Entity types, list searches and
query interfaces
Ahmet Uyar and Farouk Musa Aliyu
Department of Computer Engineering, Meliksah University, Kayseri, Turkey
Abstract
Purpose The purpose of this paper is to better understand three main aspects of semantic web
search engines of Google Knowledge Graph and Bing Satori. The authors investigated: coverage of
entity types, the extent of their support for list search services and the capabilities of their natural
language query interfaces.
Design/methodology/approach The authors manually submitted selected queries to these two
semantic web search engines and evaluated the returned results. To test the coverage of entity types,
the authors selected the entity types from Freebase database. To test the capabilities of natural
language query interfaces, the authors used a manually developed query data set about US geography.
Findings The results indicate that both semantic search engines cover only the very common entity
types. In addition, the list search service is provided for a small percentage of entity types. Moreover,
both search engines support queries with very limited complexity and with limited set of recognised
terms.
Research limitations/implications Both companies are continually working to improve their
semantic web search engines. Therefore, the findings show their capabilities at the time of conducting
this research.
Practical implications The results show that in the near future the authors can expect both
semantic search engines to expand their entity databases and improve their natural language
interfaces.
Originality/value As far as the authors know, this is the first study evaluating any aspect of newly
developing semantic web search engines. It shows the current capabilities and limitations of these
semantic web search engines. It provides directions to researchers by pointing out the main problems
for semantic web search engines.
Keywords Google Knowledge Graph and Bing Satori, Natural language queries,
Search engine evaluation, Semantic search engines
Paper type Research paper
Introduction
Today major search engines are in the process of an important technological
expansion. Traditionally they have been linking users to documents on the web by
returning a list of documents to the user queries. They were acting as an intermediary
between users and the documents on the public web. Although this will continue to be
an important component of search engines, lately they are moving to become fully
semantic search engines. They want to understand the user queries semantically and
serve their information needs precisely from their knowledge repositories. They want
to answer many of the user information needs directly. They are working to build large
knowledge repositories about real world entities and concepts. Google calls its entity
database Knowledge Graph and introduced it in May 2012 (Singhal, 2012). Bing calls its
Online Information Review
Vol. 39 No. 2, 2015
pp. 197-213
©Emerald Group Publis hing Limited
1468-4527
DOI 10.1108/OIR-10-2014-0257
Received 28 October 2014
Second revision approved
20 January 2015
The current issue and full text archive of this journal is available on Emerald Insight at:
www.emeraldinsight.com/1468-4527.htm
197
Google
Knowledge
Graph and
Bing Satori
entity database Satori and introduced it in June 2012 (Qian, 2013). In this study we
investigate some of the most important features of both Google Knowledge Graph
(GKG) and Bing Satori.
Both Google and Bing are developing entity databases for hundreds of millions
of entities. These entities are not documents on the web, but rather constructed
information about real world objects and concepts including people, places, books,
movies, events, arts, science, etc. An entity may have some properties and relationships
to other entities. The relationships of entities are particularly important. They turn the
entity database into a graph. Initially Google reported having more than 500 million
entities and more than 3.5 billion relationships (Singhal, 2012). Similarly Bing is
planning to build a database with billions of entities and relationships (Qian, 2013).
However, building an entity database is a challenging task and ongoing process.
Entity databases are usually implemented as a graph database to better handle the
connections among them (Angles and Gutierrez, 2008). Therefore they are very different
from the traditionally used inverted index file systems in search engines. This requires
search engines to redesign many of the previously used algorithms for entity databases.
New relevancy detection and entity ranking mechanisms are needed. Traditionally used
ranking algorithms such as PageRank need to be reformulated. A review of ranking
methods for entity graphs is given by Koumenides and Shadbolt (2014).
Building an accurate and comprehensive entity database is a challenging task (Dalvi
et al., 2009). It is much harder to build them compared to the traditional webpage
corpuses. When building a webpage corpus, crawlers discover the existing webpages.
They surf the web by following http links and download the ones that may be of
interest to the users. However, building entity databases involves creating entities and
relationships among them. It is not a task of discovering existing documents but rather
building entities from scratch. Search engines use both automatic data extraction
algorithms and human workers to build entity databases (Efrati, 2012; Pogue, 2012).
They use public resources on the web such as Wikipedia and social media , government
organisation data sets such as the CIA World Factbook, digital book contents such as
Google Books, etc.
Since it is a time consuming and difficult task to build entity databases, it is
important for search engines to cover entities that are most helpful to the users.
Both Google and Bing report that they are primarily motivated by the user sea rch
query logs and build entities for objects that are searched most. Google started with
landmarks, celebrities, cities, sports teams, buildings, geographical features, movies,
celestial objects, works of art and more(Singhal, 2012). Initially Google did not have
hotels, restaurants, corporations and events as entities (Pogue, 2012). However, they now
cover these kinds of entities. Bing Satori started with three types of entities movies,
restaurants and hotels in June 2012. Soon they expanded the list to include people,
places and things. They report that these are the most commonly searched entities on
Bing and about 10 per cent of all user queries are for people searches (Qian, 2013).
Another important aspect of entity databases is their query interfaces. Although
search engines are building these knowledge repositories, the interaction of users with
this data set remains a challenging task. There are three common types of query
interfaces to entity databases.
(1) Structured query languages. Structured query languages such as SPARQL let
users describe their information needs precisely. For example the following queries can
be formulated easily: 10 largest cities in the United States,actors or actresses
that played in at least 2 movies in 2012. However, these languages require users to
198
OIR
39,2

To continue reading

Request your trial

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT