A surrogate-based generic classifier for Chinese TV series reviews

Date15 May 2017
Pages66-74
DOIhttps://doi.org/10.1108/IDD-11-2016-0044
Published date15 May 2017
AuthorYufeng Ma,Long Xia,Wenqi Shen,Mi Zhou,Weiguo Fan
Subject MatterLibrary & information science,Library & information services,Lending,Document delivery,Collection building & management,Stock revision,Consortia
A surrogate-based generic classifier for
Chinese TV series reviews
Yufeng Ma
Department of Computer Sciences, Virginia Polytechnic Institute and State University, Blacksburg, Virginia, USA
Long Xia and Wenqi Shen
Department of Business Information Technology, Virginia Polytechnic Institute and State University, Blacksburg,
Virginia, USA
Mi Zhou
School of Management, Xi’an Jiaotong University, Xi’an, China, and
Weiguo Fan
Department of Accounting and Information Systems, Virginia Polytechnic Institute and State University, Blacksburg,
Virginia, USA
Abstract
Purpose – The purpose of this paper is automatic classification of TV series reviews based on generic categories.
Design/methodology/approach – What the authors mainly applied is using surrogate instead of specific roles or actors’ name in reviews to make
reviews more generic. Besides, feature selection techniques and different kinds of classifiers are incorporated.
Findings – With roles’ and actors’ names replaced by generic tags, the experimental result showed that it can generalize well to agnostic TV series
as compared with reviews keeping the original names.
Research limitations/implications – The model presented in this paper must be built on top of an already existed knowledge base like Baidu
Encyclopedia. Such database takes lots of work.
Practical implications – Like in digital information supply chain, if reviews are part of the information to be transported or exchanged, then the
model presented in this paper can help automatically identify individual review according to different requirements and help the information sharing.
Originality/value – One originality is that the authors proposed the surrogate-based approach to make reviews more generic. Besides, they also
built a review data set of hot Chinese TV series, which includes eight generic category labels for each review.
Keywords Classification, Data mining, Feature selection, Surrogate, Text processing, Topic modeling
Paper type Research paper
1. Introduction
With Web 2.0’s development, more and more commercial
websites, such as Amazon, Youtube and Youku, encourage
users to post product reviews on their platforms (Munson
et al., 2016;Goldner and Birch, 2012). These reviews are
helpful for both readers and product manufacturers. For
example, for TV or movie producers, online reviews indicate
the aspects that viewers like and/or dislike. This information
facilitates producers’ production process. When producing
future films or TV series, they can tailor their shows to better
accommodate consumers’ tastes. For manufacturers, reviews
may reveal customers’ preference and feedback on product
functions, which help manufacturers to improve their
products in future development. On the other hand,
consumers can evaluate the quality of product or TV series
based on online reviews, which help them make final decisions
of whether to buy or watch it. However, there are thousands of
reviews emerging every day. Given the limited time and
attention consumers have, it is impossible for them to allocate
equal amount of attention to all the reviews. Moreover, some
readers may be only interested in certain aspects of a product
or TV series. It is a waste of time to look at other irrelevant
ones. As a result, automatic classification of reviews is essential
for the review platforms to provide a better perception of the
review contents to the users.
Most of the existing review studies focus on product reviews
in English. While in this paper, we focus on reviews of hot
Chinese movies or TV series, which owns some unique
characteristics. First, Table I shows Chinese movies’
development (Film, 2016) in recent years. The growth of box
office and viewers is dramatically high in these years, which
provides substantial reviewer basis for the movie/TV series
review data. Although audience may comment on foreign
The current issue and full text archive of this journal is available on
Emerald Insight at: www.emeraldinsight.com/2398-6247.htm
Information Discovery and Delivery
45/2 (2017) 66–74
© Emerald Publishing Limited [ISSN 2398-6247]
[DOI 10.1108/IDD-11-2016-0044]
This study was partially supported by Natural Science Foundation of
China Grant Number 71531013.
Received 20 November 2016
Revised 10 December 2016
26 February 2017
Accepted 13 March 2017
66

To continue reading

Request your trial

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT