A document expansion framework for tag-based image retrieval

Published date15 January 2018
DOIhttps://doi.org/10.1108/AJIM-05-2017-0133
Date15 January 2018
Pages47-65
AuthorWei Lu,Heng Ding,Jiepu Jiang
Subject MatterLibrary & information science,Information behaviour & retrieval,Information & knowledge management,Information management & governance,Information management
A document expansion
framework for tag-based
image retrieval
Wei Lu and Heng Ding
School of Information Management, Wuhan University, Wuhan, China, and
Jiepu Jiang
College of Information and Computer Sciences,
University of Massachusetts, Amherst, Massachusetts, USA
Abstract
Purpose The purpose of this paper is to utilize document expansion techniques for improving image
representation and retrieval. This paper proposes a concise framework for tag-based image retrieval (TBIR).
Design/methodology/approach The proposed approach includes three core components: a strategy of
selecting expansion (similar) images from the whole corpus (e.g. cluster-based or nearest neighbor-based);
a technique for assessing image similarity, which is adopted for selecting expansion images (text, image, or mixed);
and a model for matching the expanded image representation with the search query (merging or separate).
Findings The results show that applying the proposed method yields significant improvements in
effectiveness, and the method obtains better performance on the top of the rank and makes a great
improvement on some topics with zero score in baseline. Moreover, nearest neighbor-based expansion
strategy outperforms the cluster-based expansion strategy, and using image features for selecting expansion
images is better than using text features in most cases, and the separate method for calculating the
augmented probability P(q|R
D
) is able to erase the negative influences of error images in R
D
.
Research limitations/implications Despite these methods only outperform on the top of the rank
instead of the entire rank list, TBIR on mobile platforms still can benefit from this approach.
Originality/value Unlike former studies ad dressing the sparsity , vocabulary mismatch , and tag
relatedness in TBIR indiv idually, the approac h proposed by this paper addr esses all these issues wi th a
single document expans ion framework. It is a comp rehensive investiga tion of document expans ion
techniques in TBIR.
Keywords Information retrieval, Document expansion, Retrieval model, Social image representation,
Social image retrieval, Tag-based image retrieval
Paper type Research paper
Introduction
The development of digital photography and social media-sharing platforms (e.g. Flickr and
Instagram) has led to a rapid increase in the number of social images produced.
Social bookmarks (tags) provide noisy, yet useful descriptive information to enhance
traditional image retrieval technology (Firan et al., 2007; Nov et al., 2008; Sun, Bhowmick,
Nam Nguyen, and Bai, 2011). Techniques leveraging social bookmarks for image search are
called tag-basedimage retrieval (TBIR),which have attracted wide attention(Chen et al., 2010;
Gao et al., 2013; Li et al., 2015;Li and Snoek, 2010). These approaches are general methods for
assessing the similarity between a search query and an imagestags.
Previous studies showed that social tags are usually helpful for image retrieval
(Chen et al., 2010; Gu et al., 2011; Liu et al., 2009; Sun and Bhowmick, 2008; Tang et al., 2009).
However, a social image usually only has a limited number of tags. For example, in the
NUS-WIDE data set (an open data set for TBIR), each image has only 18 tags on average,
and almost 15 percent of images own less than 8 tags. Such a tag-based image
representation often suffers from serious sparsity and vocabulary mismatch issues.
In addition, most image-sharing platforms do not allow users to assign the same tag
multiple times, which makes it difficult to distinguish informative tags from less important
Aslib Journal of Information
Management
Vol. 70 No. 1, 2018
pp. 47-65
© Emerald PublishingLimited
2050-3806
DOI 10.1108/AJIM-05-2017-0133
Received 28 May 2017
Revised 27 November 2017
Accepted 11 December 2017
The current issue and full text archive of this journal is available on Emerald Insight at:
www.emeraldinsight.com/2050-3806.htm
47
A document
expansion
framework
for TBIR
ones by their frequencies. Therefore, measuring the degree of effectiveness of a tag
describing the tagged image also becomes a crucial issue (this we refer to as tag-relatedness
issue in this paper). Many studies have been conducted to address these issues. More
concretely, neighbor voting schemes (Truong et al., 2012) are widely adopted to measure
the degree of effectiveness of a tag describing the tagged image. Tag recommendation
(Sun, Bhowmick and Chong, 2011) and tag completion (Wu et al., 2013) are both put forward
in addressing issues of sparsity and vocabulary mismatch. However, it is still very hard to
combine these methods into a uniform framework. In this paper, we propose a concise
framework based on document expansion techniques widely adopted in document retrieval
to address all these issues at once. In our approach, we consider the set of tags for an image
as a documentfor that image. Specifically, our approach has three core components:
(1) a strategy of selecting expansion (similar) images from the whole corpus;
(2) a technique for assessing image similarity, which is adopted for selecting expansion
images; and
(3) a model for matching the expanded image representation with the search query.
We describe and evaluate our approach in this paper. We compare it with previous
approaches and experiment using different implementations of the three core components.
The rest of this paper is organized as follows. In the second section, we review the related
work on TBIR from social tags research, related efforts on image retrieval, and research on
tag-based retrieval. The third section provides a detailed description of the proposed
approach. The fourth section introduces the experimental setup and analyzes the results in
detail. The fifth section concludes this study.
It is worth noting that TBIR is quite different from concept-based (i.e. text-based) image
retrieval, because of some characteristics of social tags. For example, in concept-based
image retrieval, an image is often represented by a textual document that typically has
much redundancy of words to convey its semantics. However, in TBIR, an image is
represented with many fewer tags with no or minimal redundancy. Moreover, text used in
concept-based image retrieval is usually provided by professional indexers, but social tags
are assigned by different users having different motivations, different interpretations of the
meaning of tags. Thus, traditional techniques of concept-based image retrieval, such as term
frequency weighting and document length normalization, do not work well on TBIR.
Related work
Research on social tags in the search environment
Much research has examined social tags from the perspective of organization and retrieval.
For example, Nov et al. (2008) divided tagging motivation into three categories based on
target audience and tagging function into two dimensions based on a tags intended use.
They pointed out that the organization function of tags is intended to facilitate future search
and retrieval by the user. Carman et al. (2008) found that social tags (bookmarks) are useful
for approximating actual user queries from the perspective of personalized information
retrieval. Gu et al. (2011) concluded that social tags reveal confidence issues caused by
ambiguity and synonymy. They proposed a statistic model to measure the confidence of
social tags and applied it to filter noisy tags with low tag confidence. The results of their
experiment revealed that confidence of social tags highly influenced the performance of
tag-based search methods. Wu et al. (2013) stated that since many users tend to choose
general and ambiguous tags in order to minimize their efforts in choosing appropriate
words, tags that are specific to the visual content of images tend to be missing or noisy.
Additionally, Koutrika et al. (2008) asserted that misleading tags confuse users instead of
increasing the visibility of some resource. Therefore, they proposed a method for ranking
48
AJIM
70,1

To continue reading

Request your trial

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT