Analysing user sentiment of Indian movie reviews. A probabilistic committee selection model

DOIhttps://doi.org/10.1108/EL-08-2017-0182
Date06 August 2018
Published date06 August 2018
Pages590-606
AuthorShrawan Kumar Trivedi,Shubhamoy Dey
Subject MatterInformation & knowledge management,Information & communications technology,Internet
Analysing user sentiment of
Indian movie reviews
A probabilistic committee selection model
Shrawan Kumar Trivedi
Department of IT and Systems, Indian Institute of Management Sirmaur,
Sirmaur, India, and
Shubhamoy Dey
Department of Information Systems, Indian Institute of Management Indore,
Indore, India
Abstract
Purpose To be sustainableand competitive in the current business environment, it is useful to understand
userssentiment towards products and services. This critical task can be achieved via natural language
processing and machine learning classiers. This paper aims to propose a novel probabilistic committee
selectionclassier (PCC) to analyse and classify the sentimentpolarities of movie reviews.
Design/methodology/approach An Indian movie reviewcorpus is assembled for this study. Another
publicly available movie review polarity corpus is also involved with regard to validating the results. The
greedy stepwise search methodis used to extract the features/words of the reviews. The performance of the
proposed classier is measured using different metrics, such as F-measure, false positive rate, receiver
operating characteristic (ROC) curve and training time. Further, the proposed classier is compared with
other popular machine-learning classiers, such as Bayesian, Naïve Bayes, Decision Tree (J48), Support
Vector Machineand Random Forest.
Findings The results of this studyshow that the proposed classier is good at predicting the positiveor
negative polarity of movie reviews. Its performanceaccuracy and the value of the ROC curve of the PCC is
found to be the most suitable of all other classiers tested in this study. This classier is also found to be
efcient at identifying positive sentiments of reviews, where it gives low false positive rates for both the
Indian Movie Review and Review Polarity corpora used in this study. The training time of the proposed
classieris found to be slightly higher than that of Bayesian, NaïveBayes and J48.
Research limitations/implications Only movie review sentimentswritten in English are considered.
In addition, the proposed committeeselection classier is prepared only using the committee of probabilistic
classiers; however, other classier committees can also be built, tested and compared with the present
experimentscenario.
Practical implications In this paper, a novel probabilistic approach is proposed and used for
classifying movie reviews, and is found to be highly effective in comparison with other state-of-the-art
classiers. This classier may be tested for different applications and may provide new insights for
developersand researchers.
Social implications The proposed PCC may be used to classify different product reviews,and hence
may be benecial to organizations to justify usersreviews about specic products or services. By using
authentic positive and negative sentiments of users, the credibility of thespecic product, service or event
may be enhanced. PCC mayalso be applied to other applications, such as spam detection,blog mining, news
mining and variousother data-mining applications.
Originality/value The constructedPCC is novel and was tested on Indian movie reviewdata.
Keywords Sentiment analysis, Indian movie reviews, Machine learning classiers,
Greedy stepwise search method, Probabilistic committee selection
Paper type Technical paper
EL
36,4
590
Received29 August 2017
Revised21 November 2017
Accepted26 November 2017
TheElectronic Library
Vol.36 No. 4, 2018
pp. 590-606
© Emerald Publishing Limited
0264-0473
DOI 10.1108/EL-08-2017-0182
The current issue and full text archive of this journal is available on Emerald Insight at:
www.emeraldinsight.com/0264-0473.htm
1. Introduction
With the rise of Web 2.0, the ways in which people share and express their thoughts have
changed signicantly.For instance, if a person wants to buy something, such as a phone, or
wants to watch a movie, for example, he/shewould typically go online and read user reviews
to nd a phone/movie thatmatches his/her expectations. This possibility has arisenbecause
people have started sharing theirfeelings online. Web 2.0 has created a medium for sharing
opinions, feelings and thoughts. This freedom of expression has made users more open to
expressing their opinions, and for analysts has facilitated analyses of hidden patterns to
predict customer attitude towards a particular product. This method is termed sentiment
analysis or opinion mining (Mostafa, 2013;Ye et al., 2009). Sentiment analysis entails a
process of discovering opinions,emotions, feelings or attitudes from a piece of text, which is
generally writtenby a user. The technique is universally applicable, andgenerally applies to
assessment of customerreviews in the market domain including movie reviews.
The basic aim of sentiment analysis is to classify the polarity of text or documents,
whether this is positive, negative or neutral. It also helps in classifying whether the text or
phrases are subjectiveor objective. Classication of subjectivity and objectivityis a difcult
task, however, because subjectivity depends on context and objectivity contains subjective
data. Sentiment classication is a domain-specic problem (Aue and Gamon, 2005;Moraes
et al., 2013). In natural language processing, it is considered a special case of text
classication. Text mining, natural language processing and computational linguistics
methods are often used for such analysis. There are several challenges associated with
sentiment analysis; for example, negative sentiments can be expressed by users without
using any negative words, usually as ironic sentences or through sarcasm. Identifying
sentiments behindsuch text is extremely difcult.
English is generally considered the most appropriate language for sentiment analysis
because of its universal applicability and its wide reach in terms of usage.Machine learning
(ML) classiers are popular in such studies.Under the ML approach, data are converted into
a feature vector and then used to train the ML classier to infer a combination of specic
features yielding a specic class(Pang and Lee, 2008); a model is then created to predict the
class.
In this particular work, a novel committee selection method is proposed in which
probabilistic classiers (Bayesian and Naïve Bayes [NB]) are used to build a committee of
classiers. The proposed probabilistic committee classier (PCC) is then compared with
other popular ML classiers,such as Bayesian (Ye et al., 2009), Decision Tree (J48) (Wan and
Gao, 2015), Random Forest (Liu and Chen, 2015) and Support Vector Machine (SVM)
(Moraes et al., 2013). All classiers are tested with the help of Indian Movie Review and
Movie Review Polarity corpora. The Indian Movie Review corpus was created specically
for this research. For the training of the classiers, the greedy stepwise search method is
used.
The remainder of the paper is organized as follows. Section 2 dealswith related work on
the sentiment analysis eld. Section 3 details the corpora testing, where preparation of the
dataset and all experimental design methodsare discussed and the proposed PCC and other
classiers used to compare the proposed classier are described. Section 4 outlines the
results and analysis,Section 5 presents a discussion and Section 6 concludes the paper.
2. Related work
Sentiment analysis, also known as opinion mining, is carried out using text mining
techniques in which sentiments of users are tracked and analysed. A plethora of research
has been conducted in this areato capture the sentiments of usersopinions about products,
Analysing
user sentiment
591

To continue reading

Request your trial

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT