Image search and retrieval problems in web search engines. A case study of Persian language writing style challenges

Published date08 October 2018
DOIhttps://doi.org/10.1108/OIR-01-2017-0007
Date08 October 2018
Pages752-767
AuthorYaghoub Norouzi,Hoda Homavandi
Subject MatterLibrary & information science,Information behaviour & retrieval,Collection building & management,Bibliometrics,Databases,Information & knowledge management,Information & communications technology,Internet,Records management & preservation,Document management
Image search and retrieval
problems in web search engines
A case study of Persian language writing
style challenges
Yaghoub Norouzi
Department of Knowledge and Information Science, University of Qom, Qom,
Islamic Republic of Iran, and
Hoda Homavandi
Department of Knowledge and Information Science, University of Tehran,
Tehran, Islamic Republic of Iran
Abstract
Purpose The purpose of this paper is to investigate image search and retrieval problems in selected search
engines in relation to Persian writing style challenges.
Design/methodology/approach This study is an applied one, and to answer the questions the authors
used an evaluative research method. The aim of the research is to explore the morphological and semantic
problems of Persian language in connection with image search and retrieval among the three major and
widespread search engines: Google, Yahoo and Bing. In order to collect the data, a checklist designed by the
researcher was used and then the data were analyzed by descriptive and inferential statistics.
Findings The results indicate that Google, Yahoo and Bing search engines do not pay enough attention to
morphological and semantic features of Persian language in image search and retrieval. This research reveals
that six groups of Persian language features include derived words, derived/compound words, Persian and
Arabic Plural words, use of dotted T and the use of spoken language and polysemy, which are the major
problems in this area. In addition, the results suggest that Google is the best search engine of all in terms of
compatibility with Persian language features.
Originality/value This study investigated some new aspects of the above-mentioned subject through
combining morphological and semantic aspects of Persian language with image search and retrieval.
Therefore, this study is an interdisciplinary research, the results of which would help both to offer some
solutions and to carry out similar research on this subject area. This study will also fill a gap in research
studies conducted so far in this area in Farsi language, especially in image search and retrieval.
Moreover, findings of this study can help to bridge the gap between the users questions and search engines
(systems) retrievals. In addition, the methodology of this paper provides a framework for further research on
image search and retrieval in databases and search engines.
Keywords Information retrieval, Image retrieval, Search engine, Persian language, Writing styles
Paper type Case study
1. Introduction
The web is increasingly becoming a storage for growing masses of information, and it
has become an ocean of all kinds of data, making any query into the huge information
reservoir extremely difficult(Isfandyari Moghaddam, 2007). Search engines organize
huge amounts of incoming information into the web partly and provide them to users in
different formats such as text, audio, picture, multimedia, etc. In this regard, pictures are
very important as one picture can sometimes be more illustrative than 100 words.
Therefore, concurrent with the development of web and access to different types of
Photography techniques like digital cameras and image scanners, the size of digital image
collections is increasing(Liu et al., 2007).
Many image retrieval approaches have been widely used to search for a great number
of internet images; however, it is still difficulttoretrieveimagesthatsatisfythequery
intentions of users (Guo and et al., 2016). Many factors can overwhelm the searcher
Online Information Review
Vol. 42 No. 6, 2018
pp. 752-767
© Emerald PublishingLimited
1468-4527
DOI 10.1108/OIR-01-2017-0007
Received 14 January 2017
Revised 11 October 2017
23 February 2018
Accepted 27 June 2018
The current issue and full text archive of this journal is available on Emerald Insight at:
www.emeraldinsight.com/1468-4527.htm
752
OIR
42,6
who is trying to retrieve images. Consequently, searching for images is still a challenge for
the majority of individuals, especially for images associated with texts written in different
languages which are unfamiliar to users (Ménard, 2011). Moreover, many people need to
do this for differen t purposes, ranging from a simple sear ch for a group of images related
to a general subject to a specific search which may be done by a scientist. Efficient image
search tools by Google (http://images.google.com), MSN (http://search.msn.com/images)
and Yahoo (http://images.search.yahoo.com/) are clear evidence that image indexing and
searching is common and widespread in todays visual culture(Neugebauer, 2010).
The multilingualityof web content provides opportunities for users to directly access
and use previouslyincomprehensiblesources of web information;nevertheless, web users find
it difficult to take advantage of these opportunities when the online information access
systems are monolingual (Bao and Chen, 2009). In recent years, the number of non-English
resources on the web has been growing rapidly, and tools that can build specialized search
engines in different languages are thus highly desired (Chau and et al., 2008). Based on the
Internet World Stats(2016) latest estimates for internet users by language, 67.8 percent of
users use English and 32.2 percent of them are Non-English-speaking users. This feature
makes the weba multilingual and multicultural informationspace. Although there are several
search engines for facilitating web search, it appears that their attention to non-English
languages in comparison with English language is not enough (Lazarinis, 2007a).
Persian users like Iranians are no exception to this rule. Based on Internet National
Development Management Centers (2015) latest estimates for internet penetration in Iran,
82.12 percent of the people use the internet to meet their information needs; therefore, their
linguistic and writing problems must be taken into consideration. Persian language
encompasses a broad range of speakers in Iran and some of its neighboring countries;
consequently, codification of rules and criteria for Persian language is very important,
especially whenwe take into account the increasing use of computerin Persian language and
script area. Besides, concerns about the risk of dispersion and applying different
and antithetical styles have been growing over the past years (Academy of Persian
Language and Literature, 2010).
As it was mentioned above, the language of search is one of the most crucial factors
whose significance in relation to the image search and retrieval is doubled because of the
fact that one of the fundamental differences between textual and visual information
is the nature of their retrieval process. Thus, the problem of image retrieval is an
increasingly active area for research and development (Murthyand et al., 2010).
Retrieving of textual information is based on the context and usage whereas Most of
image search engines are relying on the text for indexing images, and this means that the
quality of their results depends on the quality of the text associated with the image (such
as file name, text next to the image, page title or HTML tag)(TASI, 2008)[1]. Languages
like Persian have some features and complexities that disregarding them could lead to
problems in information searching and retrieving. As a result, the present study attempts
to investigate image search and retrieval problems in connection with Persian writing
styles in selected search engines, namely Google, Bing and Yahoo. These search engines
were selected based on their ranking among the most common engines[2] and their
backing of Persian language searching. We focused on images as there have been many
conflicts over content-based retrieval systems that retrieve images based on visual
features (l ike shape, size and color) and conc ept-based image retrieval systems for years
(Lazarinis, 2008). So, automatically finding images relevant to a textual query remains a
very challenging task (Meng, 2015). Selected search engines retrieve images
based on textual information of them which is called concept-based image retrieving.
So focus on image in this study could show search enginespotentials (as an example of
concept-based image retrieval systems) in relation to their approach to image storing
753
Image search
and retrieval
problems

To continue reading

Request your trial

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT