Improving sentiment scoring mechanism: a case study on airline services

DOIhttps://doi.org/10.1108/IMDS-07-2017-0300
Pages1578-1596
Publication Date10 Sep 2018
AuthorWandeep Kaur,Vimala Balakrishnan
SubjectInformation & knowledge management,Information systems,Data management systems,Knowledge management,Knowledge sharing,Management science & operations,Supply chain management,Supply chain information systems,Logistics,Quality management/systems
Improving sentiment scoring
mechanism: a case study on
airline services
Wandeep Kaur and Vimala Balakrishnan
Department of Information Systems,
Faculty of Computer Science and Information Technology,
University of Malaya, Kuala Lumpur, Malaysia
Abstract
Purpose The purpose of this paper is to investigate the effect of including letter repetition commonly
found within social media text and its impact in determining the sentiment scores for two major airlines
in Malaysia.
Design/methodology/approach A Sentiment Intensity Calculator (SentI-Cal) was developed by
assigning individual weights to each letter repetition, and tested it using data collected from official Facebook
pages of the airlines.
Findings Evaluation metrics indicate that SentI-Cal outperforms the baseline tool Semantic Orientation
Calculator (SO-CAL), with an accuracy of 90.7 percent compared to 58.33 percent for SO-CAL.
Practical implications A more accurate sentiment score allows airline services to easily obtain a
better understanding of the sentiments of their customers, hence providing opportunities in improving their
airline services.
Originality/value Proposed mechanism calc ulates sentiment intensity of so cial media text by assigning
individual weightag e to each repeated lette r and exclamation mark thu s producing a more accura te
sentiment score.
Keywords Social media, Sentiment analysis, Text mining, Airline services, Scoring mechanism
Paper type Case study
1. Introduction
Sentiment analysis is an area of research that is interested in decrypting unstructured data
for the purpose of establishing the attitude of an author with respect to a subject matter (Liu,
2012). The interest in extracting information from amorphous data has seen the field of
sentiment analysis expand from customer reviews (Bagheri et al., 2013; Gupta et al., 2015;
Maharani et al., 2015), electoral and political analysis (Adedoyin-Olowe et al., 2016; Delmonte
et al., 2013; Smailovićet al., 2015), disaster management (Li et al., 2016; Neppalli et al., 2017;
Vo and Collier, 2013) to epidemic breakouts (Almazidy et al., 2016; Missier et al., 2017;
Sun et al., 2015). Similarly, the aviation industry recognizes the impact of social media in
improving not only the airline brand awareness but also customer loyalty and recognition
(Li, 2017; Yee Liau and Pei Tan, 2014).
According to Medhat et al. (2014), sentiment classification can be categorized into
machine learning approach, lexicon-based approach and hybrid approach. The machine
learning approach employs algorithms such as Naïve Bayes, Support Vector Machine,
Decision Tree, Logistic Regression, etc., whereas the lexicon-based approach is dependent
on sentiment lexicons (i.e. dictionary of opinion words and phrases with the assigned
polarities and intensities) for gauging the sentiment of a text. The semantic orientation
which is an arithmetic measure for polarity and word strength representation
(Bravo-Marquez et al., 2016) is often computed using the sentiment lexicons. Therefore,
the accuracy and performance of a lexicon-based sentiment analysis system profoundly
Industrial Management & Data
Systems
Vol. 118 No. 8, 2018
pp. 1578-1596
© Emerald PublishingLimited
0263-5577
DOI 10.1108/IMDS-07-2017-0300
Received 11 July 2017
Revised 11 October 2017
4 December 2017
Accepted 16 December 2017
The current issue and full text archive of this journal is available on Emerald Insight at:
www.emeraldinsight.com/0263-5577.htm
The authors would like to thank University of Malaya for supporting this study (RP028A-14AET) and
Rayven Visvalingam for his contribution in the development of SentI-Cal.
1578
IMDS
118,8
weighs on the coverage of lexicons used, the sentiment strength and the degree of
uniqueness to the sentiment category (Neviarouskaya et al., 2011; Wu et al., 2016).
Various lexicon-based sentiment analysis tools are freely available for use such as
SentiHealth-Cancer (Rodrigues et al., 2016), Linguistic Inquiry Word Count (LIWC)
(Pennebaker et al., 2015), SmartSA (Muhammad et al., 2016), etc. Among them is the
Semantic Orientation Calculator (SO-CAL) which uses a dictionary that contains
annotated words with its relevant semantic orientation and consolidates semantic features
such as negation and intensification. Despite the reliable performance of SO-CAL, it tends
to discard words that contain repeated letters as part of its data cleaning procedure
(Taboada et al., 2011). One of the areas of concern when it comes to misspelled words is
letter repetition. Thelwall et al. (2012) stated repeated letters in a word which are widely
evident on social media platforms, showcasesemphasisthusequatingittoamuch
stronger sentiment. By ignoring repeated letters, the overall sentiment score is affected as
the final sentiment score can be further improved considering the impact of repeated
letters (Kiritchenko et al., 2014).
The study aims to address this limitation by improving the sentiment analysis scoring
mechanism by computationally weighing each repeated letter in a word in order to
classify sentiment more accurately for social media data. This is achieved by developing
an enhanced SO-CAL called Sentiment Intensity Calculator (SentI-Cal) which takes
repeated letters into consideration. The proposed SentI-Cal was used to calculate the
sentiment intensity for Facebook posts extracted from two widely known airline carriers
in Malaysia; hereafter known as X and Y. A comparison experimentation was carried out
using SO-CAL against the same data set. The results of this comparison indicate a
significant improvement in accurately classifying sentiment using SentI-Cal instead
of SO-CAL.
The remainder of this paper is structured as follows: In Section 2, literature
review focusing on lexicon-based sentiment analysis tools is reported. Section 3
focuses on the mechanisms employed in developing SentI-Cal, followed by a case study
involving the two major airlines in Section 4. The evaluation metrics are discussed in
Section 5 with results and discussion in Section 6. Finally, the paper is concluded
in Section 7.
2. Literature review
Medhat et al. (2014) identified two techniques when it comes to lexicon-based approach for
sentiment analysis, namely, corpus based and dictionary based (Figure 1). The polarity
value in corpus-based approach is calculated with reference to the co-occurrences of the
phrase with other positive or negative seed words in the corpus, and is either statistically or
semantically computed. The dictionary-based approach, however, employs predeveloped
Lexicon-Based
Approach
Dictionary-Based
Approach
Manual Lexicon
Creation
Automated Lexicon
Creation
Corpus-Based
Approach
Statistical
Semantic
Legend:
------ Current Focus
Figure 1.
Lexicon-based
approach techniques
1579
Improving
sentiment
scoring
mechanism

To continue reading

Request your trial

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT