Capturing user sentiments for online Indian movie reviews. A comparative analysis of different machine-learning models

Pages677-695
Date06 August 2018
Published date06 August 2018
DOIhttps://doi.org/10.1108/EL-04-2017-0075
AuthorShrawan Kumar Trivedi,Shubhamoy Dey,Anil Kumar
Subject MatterInformation & knowledge management,Information & communications technology,Internet
Capturing user sentiments for
online Indian movie reviews
A comparative analysis of different
machine-learning models
Shrawan Kumar Trivedi
Department of IT and Systems, Indian Institute of Management Sirmaur,
Sirmaur, India
Shubhamoy Dey
Department of Information Systems, Indian Institute of Management Indore,
Indore, India, and
Anil Kumar
Department of Decision Science, BML Munjal University, Gurgaon, India
Abstract
Purpose Sentiment analysis andopinion mining are emerging areas of research for analyzingWeb data
and capturing userssentiments.This research aims to present sentiment analysis of an Indian movie review
corpus usingnatural language processing and variousmachine learning classiers.
Design/methodology/approach In this paper, a comparative study between three machinelearning
classiers (Bayesian, naïve Bayesianand support vector machine [SVM]) was performed. All the classiers
were trained on the words/features of the corpus extracted,using ve different feature selection algorithms
(Chi-square, info-gain,gain ratio, one-R and relief-F [RF] attributes),and a comparative study was performed
between them. The classiers and feature selection approaches were evaluated using different metrics
(F-value,false-positive [FP] rate and training time).
Findings The results of this study show that, for the maximum number of features, the RF feature
selection approachwas found to be the best, with better F-values, a low FP rate and less time neededto train
the classiers, whereasfor the least number of features, one-R was better than RF. When theevaluation was
performed for machine learning classiers, SVM was found to be superior, although the Bayesian classier
was comparablewith SVM.
Originality/value This is a novel research where Indian review data were collected and then a
classicationmodel for sentiment polarity (positive/negative)was constructed.
Keywords Opinion mining, Indian movie reviews, Machine learning classiers,
User sentiment analysis
Paper type Research paper
Introduction
The Web has signicantly transformed the world, and the rise of with the rise of Web 2.0,
the current situation is changing, as people can now express their thoughts and opinions
digitally. People can also read specicproduct or service reviews, written by other users, by
simply accessing the desired online portalbefore making a purchase decision; alternatively,
if someone wants to watch a movie, he/she can simply read the movies reviews before
making a decision. The internet has given freedom of speech to users: they can write their
User
sentiments for
Indian movie
reviews
677
Received3 April 2017
Revised23 October 2017
Accepted22 November 2017
TheElectronic Library
Vol.36 No. 4, 2018
pp. 677-695
© Emerald Publishing Limited
0264-0473
DOI 10.1108/EL-04-2017-0075
The current issue and full text archive of this journal is available on Emerald Insight at:
www.emeraldinsight.com/0264-0473.htm
feelings/sentiments in theform of reviews or blogs using online portals. Such user behavior
creates opportunities for online retailers and organizations in the form of text data.
Furthermore, these text data can be analyzed using various natural language processing
tools and articial intelligence, whichcan help businesses make better decisions and better
predict success and sustainability.
Sentiment analysis is an important area of research, which uses a number of
applications. It is found to be a robust analysis for understanding customersfeelings
and attitudes toward various products and services (Manek et al., 2017). The feedback
provided by customers helps organizations to make informed decisions. For example, a
hotel review may help a visitor to locate the most suitable hotel. In the same fashion,
movie reviews may help consumers in deciding whether a movie is worth watching.
A sentiment is an expression of opinion, feeling or emotion, or an assessment made by
the individual that can be either positive or negativeor neutral. These polarities are known
as sentiment orientations, opinion orientations, semantic polarity or simply orientations.
Such polarity can be classied andpredicted by opinion mining, and can be distinguished in
three ways:
(1) Document-level sentiments. At this level, a whole document is considered as a
positive or negative sentiment for specic products or services. This level is
restricted to those documents that do not measure or compare various attributes
because, at this level, a whole document represents a sentiment toward a single
attribute (or single product) (Liu, 2012).
(2) Sentence-level sentiments. At this level of classication, a sentence is used to
decide the positive, negative or neutral sentiment toward products or services.
Sentence-level sentiments deal with subjective classication, and differentiate
between subjective and objective sentiment classications, whereby subjective
sentences reveal the opinion or sentiment, and objective sentences convey the
true information. Information-handling requirements for objective sentences
are found to be greater than those for subjective sentences, e.g. A few buttons
on the remote control of a smart TV which we purchased a couple of days back
are malfunctioning(Liu, 2012).
(3) Entity- and aspect-level sentiments.Thisanalysisisbasedonthefeature,or
attributes, of the text where a feature, or word, is taken as either a positive or a
negative sentiment. This is a ner-grained analysis, in which all the features,
taken together, provide insight in to the overall sentiment weight of any
opinion. Aspect-level sentiment analysis denes the opinion as positive,
negative or neutral, based on the words/featuressentiment weight (Hu and
Liu, 2004).
In various opinions and reviews, the sentiment of an opinion depends on the individual
entities and their respective aspects (Manek et al., 2017). For example, the sentence
Although the battery backup is not that high, I still like the Samsung mobile phone
contains both partial positive sentiments and partial negative sentiments. Here, a positive
sentiment is expressed for Samsungand a negative sentiment for battery backup.
Hence, the aim of performing an analysis at this level is to decidewhich entities have which
aspect. While performing such an analysis, unstructured text must be converted to
structured text for capturingthese entities and aspects. This level of analysis imposes more
challenges than document- or sentence-level analysis. Sentiment classication is a domain-
specic problem (Aue and Gamon, 2005). In natural language processing, this is a special
EL
36,4
678

To continue reading

Request your trial

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT