Using Twitter data to predict the performance of Bollywood movies

Published date19 October 2015
Pages1604-1621
DOIhttps://doi.org/10.1108/IMDS-04-2015-0145
Date19 October 2015
AuthorDipak Damodar Gaikar,Bijith Marakarkandy,Chandan Dasgupta
Subject MatterInformation & knowledge management,Information systems,Data management systems
Using Twitter data to
predict the performance
of Bollywood movies
Dipak Damodar Gaikar and Bijith Marakarkandy
Department of Information Technology,
Thakur College of Engineering and Technology, Mumbai, India, and
Chandan Dasgupta
SBM, NMIMS University, Mumbai, India
Abstract
Purpose The purpose of this paper is to address the shortcomings of limited research in forecasting
the power of social media in India.
Design/methodology/approach This paper uses sentiment analysis and prediction algorithms
to analyze the performance of Indian movies based on data obtained from social media sites.
The authors used Twitter4j Java API for extracting the tweets through authenticating connection
with Twitter web sites and stored the extracted data in MySQL database and used the data for
sentiment analysis. To perform sentiment analysis of Twitter data, the Probabilistic Latent
Semantic Analysis classification model is used to find the sentiment score in the form of positive,
negative and neutral. The data mining algorithm Fuzzy Inference System is used to implement
sentiment analysis and predict movie performance that is classified into three categories: hit,
flop and average.
Findings In this study the authors found results of movie performance at the box office,
which had been based on fuzzy interface system algorithm for prediction. The fuzzy interface
system contains two factors, namely, sentiment score and actor rating to get the accurate result.
By calculation of opening weekend collection, the authors found that that the predicted values
were approximately same as the actual values. For the movie Singham Returns over method of
prediction gave a box office collection as 84 crores and the actual collection turned out to be
88 crores.
Research limitations/implications The current study suffers from the limitation of not having
enough computing resources to crawl the data. For predicting box office collection, there is no correct
availability of ticket price information, total number of seats per screen and total number of shows
per day on all screens. In the future work the authors can add several other inputs like budget of movie,
Central Board of Film Certification rating, movie genre, target audience that will improve the accuracy
and quality of the prediction.
Originality/value The authors used different factors for predicting box office movie performance
which had not been used in previous literature. This work is valuable for promoting of product and
services of the firms.
Keywords Prediction, Sentiment analysis, Twitter, Social media, Social network,
Fuzzy inference system
Paper type Research paper
1. Introduction
The emergence of the web and online social media has represented a fundamental shift
as it has added new dimensions to the production and dissemination of news and
information. Social media is defined as media designed to disseminate information
through social interaction which has been created using highly accessible and scalable
publishing techniques. Users usually generate content, access information to reach a
Industrial Management & Data
Systems
Vol. 115 No. 9, 2015
pp. 1604-1621
©Emerald Group Publishing Limited
0263-5577
DOI 10.1108/IMDS-04-2015-0145
Received 17 April 2015
Revised 8 September 2015
13 September 2015
14 September 2015
Accepted 14 September 2015
The current issue and full text archive of this journal is available on Emerald Insight at:
www.emeraldinsight.com/0263-5577.htm
1604
IMDS
115,9
large audience. Social media has replaced the traditional one-way mass media to
consumer communication channel with an interactive dialogue, which helps in creation
and exchange of user-generated content. Companies analyze social media data to
perform analytics and sentiment analysis. The massive growth of online social
networks like Twitter, Facebook and other social networking portals have created a
need to determine peoples opinion and moods. Users can browse information and
opinions from diverse sources that help them tap into the crowds while making more
informed decisions (Asur and Huberman, 2010).
Social media is a promising link which helps to build connection on social networks,
personal information channels and mass media. Social media data in the form of
user-generated content on blogs, blog reviews, microblogging like Twitter, discussion
forums, different types of social sites, product review and multimedia sharing web sites
present many new opportunities and challenges to both producers and consumers of
information (Asur and Huberman, 2010). Although mass media have created a new type of
marketing and communication that connects the bridge between simple word-of-mouth
(eWOM) and ideas which helps business to run in a profitable manner. Posting user
feedback on products has become increasingly popular for people to express their opinions
and sentiments toward products and services. Analyzing the immense online reviews
available, would yield to a database which could be of economic value to producers,
vendors and other interested parties. In the movie domain, a single movie can experience
sales margins between millions of rupees in profits or losses for a movie house in a
particular year. Therefore, that movie studio is immensely involved in forecasting the
performance of the upcoming movies (Xiaohui et al., 2010).
Microblogging services in recent times have been a popular communication tool
among internet users. It generates millions of daily messages for popular web sites. Due
to a free format of messages and an easy accessibility of microblogging platforms for
internet users, users tend to shift from traditional communication tools (such as
traditional blogs or mailing lists) to microblogging services. If more and more users
post about products and services which they use or express their political and religious
views about them through microblogging web sites, then it becomes very valuable
source of understanding public mood. Such data can be efficiently used for marketing
or social studies (Pak and Paroubek, 2012).
Microblogging is online eWOM branding like Twitter, is now serving as electronic
eWOM, forming a eWOM branding which is based on social networking and trust.
Twitter has been swamped with active users during the last few years and much
attention has been given in analyzing the social behavior and opinions of users. The
wide-spread popularity of online social networks and the resulting availability of data
have enabled the investigation of new research questions, such as the analysis and
estimation of public opinion on various subjects (Charalampidou, 2012).
Peoples sentiment toward a particular matter when expressed online, can be very
useful in many cases whose classification and estimation arises to a crucial point. The
volume of discussion about products on Twitter can be correlated with the products
performance. It is also known that social network users represent the aggregate voice
of millions of potential consumers, especially for products designed for the target-group
of young-aged technology users. This reveals a brand new aspect that companies
should consider closely, and this free and high-scale feedback can give them the
opportunity to understand consumer needs and take proper action (Charalampidou,
2012). Additionally, a lot of effort has been made in social media analysis, regarding its
power of predicting real-world outcomes. In the recent years, some of the research that
1605
Performance
of Bollywood
movies

To continue reading

Request your trial

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT