Analysing user sentiment of Indian movie reviews. A probabilistic committee selection model
DOI | https://doi.org/10.1108/EL-08-2017-0182 |
Date | 06 August 2018 |
Published date | 06 August 2018 |
Pages | 590-606 |
Author | Shrawan Kumar Trivedi,Shubhamoy Dey |
Subject Matter | Information & knowledge management,Information & communications technology,Internet |
Analysing user sentiment of
Indian movie reviews
A probabilistic committee selection model
Shrawan Kumar Trivedi
Department of IT and Systems, Indian Institute of Management Sirmaur,
Sirmaur, India, and
Shubhamoy Dey
Department of Information Systems, Indian Institute of Management Indore,
Indore, India
Abstract
Purpose –To be sustainableand competitive in the current business environment, it is useful to understand
users’sentiment towards products and services. This critical task can be achieved via natural language
processing and machine learning classifiers. This paper aims to propose a novel probabilistic committee
selectionclassifier (PCC) to analyse and classify the sentimentpolarities of movie reviews.
Design/methodology/approach –An Indian movie reviewcorpus is assembled for this study. Another
publicly available movie review polarity corpus is also involved with regard to validating the results. The
greedy stepwise search methodis used to extract the features/words of the reviews. The performance of the
proposed classifier is measured using different metrics, such as F-measure, false positive rate, receiver
operating characteristic (ROC) curve and training time. Further, the proposed classifier is compared with
other popular machine-learning classifiers, such as Bayesian, Naïve Bayes, Decision Tree (J48), Support
Vector Machineand Random Forest.
Findings –The results of this studyshow that the proposed classifier is good at predicting the positiveor
negative polarity of movie reviews. Its performanceaccuracy and the value of the ROC curve of the PCC is
found to be the most suitable of all other classifiers tested in this study. This classifier is also found to be
efficient at identifying positive sentiments of reviews, where it gives low false positive rates for both the
Indian Movie Review and Review Polarity corpora used in this study. The training time of the proposed
classifieris found to be slightly higher than that of Bayesian, NaïveBayes and J48.
Research limitations/implications –Only movie review sentimentswritten in English are considered.
In addition, the proposed committeeselection classifier is prepared only using the committee of probabilistic
classifiers; however, other classifier committees can also be built, tested and compared with the present
experimentscenario.
Practical implications –In this paper, a novel probabilistic approach is proposed and used for
classifying movie reviews, and is found to be highly effective in comparison with other state-of-the-art
classifiers. This classifier may be tested for different applications and may provide new insights for
developersand researchers.
Social implications –The proposed PCC may be used to classify different product reviews,and hence
may be beneficial to organizations to justify users’reviews about specific products or services. By using
authentic positive and negative sentiments of users, the credibility of thespecific product, service or event
may be enhanced. PCC mayalso be applied to other applications, such as spam detection,blog mining, news
mining and variousother data-mining applications.
Originality/value –The constructedPCC is novel and was tested on Indian movie reviewdata.
Keywords Sentiment analysis, Indian movie reviews, Machine learning classifiers,
Greedy stepwise search method, Probabilistic committee selection
Paper type Technical paper
EL
36,4
590
Received29 August 2017
Revised21 November 2017
Accepted26 November 2017
TheElectronic Library
Vol.36 No. 4, 2018
pp. 590-606
© Emerald Publishing Limited
0264-0473
DOI 10.1108/EL-08-2017-0182
The current issue and full text archive of this journal is available on Emerald Insight at:
www.emeraldinsight.com/0264-0473.htm
1. Introduction
With the rise of Web 2.0, the ways in which people share and express their thoughts have
changed significantly.For instance, if a person wants to buy something, such as a phone, or
wants to watch a movie, for example, he/shewould typically go online and read user reviews
to find a phone/movie thatmatches his/her expectations. This possibility has arisenbecause
people have started sharing theirfeelings online. Web 2.0 has created a medium for sharing
opinions, feelings and thoughts. This freedom of expression has made users more open to
expressing their opinions, and for analysts has facilitated analyses of hidden patterns to
predict customer attitude towards a particular product. This method is termed sentiment
analysis or opinion mining (Mostafa, 2013;Ye et al., 2009). Sentiment analysis entails a
process of discovering opinions,emotions, feelings or attitudes from a piece of text, which is
generally writtenby a user. The technique is universally applicable, andgenerally applies to
assessment of customerreviews in the market domain including movie reviews.
The basic aim of sentiment analysis is to classify the polarity of text or documents,
whether this is positive, negative or neutral. It also helps in classifying whether the text or
phrases are subjectiveor objective. Classification of subjectivity and objectivityis a difficult
task, however, because subjectivity depends on context and objectivity contains subjective
data. Sentiment classification is a domain-specific problem (Aue and Gamon, 2005;Moraes
et al., 2013). In natural language processing, it is considered a special case of text
classification. Text mining, natural language processing and computational linguistics
methods are often used for such analysis. There are several challenges associated with
sentiment analysis; for example, negative sentiments can be expressed by users without
using any negative words, usually as ironic sentences or through sarcasm. Identifying
sentiments behindsuch text is extremely difficult.
English is generally considered the most appropriate language for sentiment analysis
because of its universal applicability and its wide reach in terms of usage.Machine learning
(ML) classifiers are popular in such studies.Under the ML approach, data are converted into
a feature vector and then used to train the ML classifier to infer a combination of specific
features yielding a specific class(Pang and Lee, 2008); a model is then created to predict the
class.
In this particular work, a novel committee selection method is proposed in which
probabilistic classifiers (Bayesian and Naïve Bayes [NB]) are used to build a committee of
classifiers. The proposed probabilistic committee classifier (PCC) is then compared with
other popular ML classifiers,such as Bayesian (Ye et al., 2009), Decision Tree (J48) (Wan and
Gao, 2015), Random Forest (Liu and Chen, 2015) and Support Vector Machine (SVM)
(Moraes et al., 2013). All classifiers are tested with the help of Indian Movie Review and
Movie Review Polarity corpora. The Indian Movie Review corpus was created specifically
for this research. For the training of the classifiers, the greedy stepwise search method is
used.
The remainder of the paper is organized as follows. Section 2 dealswith related work on
the sentiment analysis field. Section 3 details the corpora testing, where preparation of the
dataset and all experimental design methodsare discussed and the proposed PCC and other
classifiers used to compare the proposed classifier are described. Section 4 outlines the
results and analysis,Section 5 presents a discussion and Section 6 concludes the paper.
2. Related work
Sentiment analysis, also known as opinion mining, is carried out using text mining
techniques in which sentiments of users are tracked and analysed. A plethora of research
has been conducted in this areato capture the sentiments of users’opinions about products,
Analysing
user sentiment
591
To continue reading
Request your trial