Providing consumers with a representative subset from online reviews

Pages877-899
Published date09 October 2017
DOIhttps://doi.org/10.1108/OIR-05-2016-0125
Date09 October 2017
AuthorJin Zhang,Ming Ren,Xian Xiao,Jilong Zhang
Subject MatterLibrary & information science,Information behaviour & retrieval,Collection building & management,Bibliometrics,Databases,Information & knowledge management,Information & communications technology,Internet,Records management & preservation,Document management
Providing consumers
with a representative subset
from online reviews
Jin Zhang
School of Business, Renmin University of China, Beijing, China
Ming Ren
School of Information Resource Management,
Renmin University of China, Beijing, China
Xian Xiao
Guanghua School of Management, Peking University, Beijing, China, and
Jilong Zhang
School of Business, Renmin University of China, Beijing, China
Abstract
Purpose The purpose of this paper is to find a representative subset from large-scale online reviews for
consumers. The subset is significantly small in size, but covers the majority amount of information in the
original reviews and contains little redundant information.
Design/methodology/approach A heuristic approach named RewSel is proposed to successively select
representatives until the number of representatives meets the requirement. To reveal the advantages of the
approach, extensive data experiments and a user study are conducted on real data.
Findings The proposed approach has the advantage over the benchmarks in terms of coverage and
redundancy. People show preference to the representative subsets provided by RewSel. The proposed
approach also has good scalability, and is more adaptive to big data applications.
Research limitations/implications The paper contributes to the literature of review selection, by
proposing a heuristic approach which achieves both high coverage and low redundancy. This study can be
applied as the basis for conducting further analysis of large-scale online reviews.
Practical implications The proposed approach offers a novel way to select a representative subset of
online reviews to facilitate consumer decision making. It can also enhance the existing information retrieval
system to provide representative information to users rather than a large amount of results.
Originality/value The proposed approach finds the representative subset by adopting the concept of
relative entropy and sentiment analysis methods. Compared with state-of-the-art approaches, it offers a more
effective and efficient way for users to handle a large amount of online information.
Keywords Online reviews, Redundancy, Coverage, Heuristic approach, Representative subset
Paper type Research paper
1. Introduction
Recent years have witnessed the advent of large volumes of online reviews on e-commerce
websites. Online reviews are an important source of information for consumers to make
informed decisions when purchasinga product, booking a flightor making a hotel reservation
(Archak et al., 2011; Matute et al., 2016; Neirottiet al.,2016;Wanget al., 2016). As an important
form of user-generated content, online reviews reflect consumersgenuine experiences and
describe aspectsof a product that are not disclosed in official channels (Pan and Zhang, 2011;
Chen, 2016). Therefore, consumers trust online reviews more than expert reviews on
traditionalmedia (Chen, 2008; Archak et al., 2011;Chiou et al., 2014; Yeh, 2015). Onlinereviews
are also an important source of information for vendors to identify opportunities to improve
their products or launch new products (Lee and Yang, 2015; Qi et al., 2016). Online Information Review
Vol. 41 No. 6, 2017
pp. 877-899
© Emerald PublishingLimited
1468-4527
DOI 10.1108/OIR-05-2016-0125
Received 6 May 2016
Revised 8 May 2017
Accepted 15 June 2017
The current issue and full text archive of this journal is available on Emerald Insight at:
www.emeraldinsight.com/1468-4527.htm
The work is supported by Fundamental Research Funds for the Central Universities, and the Research
Funds of Renmin University of China (14XNI012).
877
Subset from
online reviews
While online reviews are useful for both consumers and vendors, the immense volume
of online reviews has caused a problem of information overload (Bawden and
Robinson, 2009; Krishen et al., 2011). For example, a product may have hundreds or even
thousands of reviews on an e-commerce platform (Hu and Liu, 2004; Park and Lee, 2008).
Due to the limited processing capacity, consumers cannot correctly and comprehensively
process all the reviews, and thus may form biased opinions (Bawden and Robinson, 2009;
Oulasvirta et al., 2009).
The problem of information overload gives rise to the need to find a small set of high-
quality information (Bawden and Robinson, 2009; Zhang et al., 2012), especially because
nowadays people use mobile devices (e.g. smart phones) that have limited screen sizes and
low navigability (Chuang et al., 2012). One well-known approach is to provide the top-k
reviews according to a rank based on some criteria, such as posting time or helpfulness.
However, the top-kapproach is susceptible to two major issues. First, it may not cover
various aspects of all the review content, and the overlooked aspects could be important to
consumers and corporations to make informed decisions. Second, the top-kreviews may
contain redundant information, because reviews that rank high are often similar.
To provide a better user experience in terms of quickly grasping a large amount of online
reviews, it is preferable to provide a representative subset of reviews (Pan et al., 2005), which
is significantly small in size but captures the majority amount of information in the original
data set (i.e. high coverage) and has little redundant information (i.e. low redundancy).
For example, Pan et al. (2005) proposed a greedy algorithm to extract a representative subset
from a database with binary tuples. Later, Zhang et al. (2014) extended the work of Pan et al.
to text data, and proposed a heuristic approach to extract a subset from web search results.
However, these methods cannot be readily applied to online reviews, which have some
distinct characteristics, such as expressing opinions on specific features. Meanwhile, there
are studies on review selection that formulate the problem as a maximum coverage problem
(Tsaparas et al., 2011; Nguyen et al., 2015), but those studies have not taken redundancy into
consideration, which is another essential property of representative information and can
highly affect user experience (Pan et al., 2005).
This paper aims to find a representative subset from a large amount of online reviews
in terms of both high coverage and low redundancy. A heuristic approach called RewSel is
proposed to select representatives one by one until the number of representatives meets
the requirement. To be concrete, in order to include as much information as possible, the
first selected representative is the one that is most similar to the original review set. When
therepresentativesubsetcontainsoneormorereviews,thenextonethatbringsleast
redundant information to the existing representative subset is selected. Such a
representative also contributes to coverage by including additional information.
Extensive experiments demonstrate that the selected subsets possess desirable
properties of high coverage and low redundancy, which are two essential
characteristics of representative information (Pan et al., 2005).
In short, the main contributions of the work can be summarized as follows:
(1) A novel representative online review selection problem taking into consideration
both coverage and redundancy is proposed in this study. To the best of our
knowledge, although the issue of redundancy within selected subsets can be found
in some document extraction studies, it still has not been adequately studied by
review selection literature.
(2) A heuristic approach that achieves high coverage and low redundancy is proposed
to select a representative subset from online reviews to facilitate consumer decision
making. In the proposed approach, relative entropy is applied as a metric to measure
the differences between review sets.
878
OIR
41,6

To continue reading

Request your trial

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT