Detecting sarcasm in customer tweets: an NLP based approach

Date10 July 2017
Pages1109-1126
DOIhttps://doi.org/10.1108/IMDS-06-2016-0207
Published date10 July 2017
AuthorShubhadeep Mukherjee,Pradip Kumar Bala
Subject MatterInformation & knowledge management,Information systems,Data management systems,Knowledge management,Knowledge sharing,Management science & operations,Supply chain management,Supply chain information systems,Logistics,Quality management/systems
Detecting sarcasm in customer
tweets: an NLP based approach
Shubhadeep Mukherjee
Department of Information Systems,
Indian Institute of Management Ranchi, Ranchi, India, and
Pradip Kumar Bala
Information Systems, IIM Ranchi, India
Abstract
Purpose The purpose of this paper is to study sarcasm in online text specifically on twitter to better
understand customer opinions about social issues, products, services, etc. This can be immensely helpful in
reducing incorrect classification of consumer sentiment toward issues, products and services.
Design/methodology/approach In this study, 5,000 tweets were downloaded and analyzed. Relevant
features were extracted and supervised learning algorithms were applied to identify the best differentiating
features between a sarcastic and non-sarcastic sentence.
Findings The results using two different classification algorithms, namely, Naïve Bayes and maximum
entropy show that function words and content words together are most effective in identifying sarcasm in
tweets. The most differentiating features between a sarcastic and a non-sarcastic tweet were identified.
Practical implications Understanding the use of sarcasm in tweets let companies do better sentiment
analysis and product recommendations for users. This could help businesses attract new customers and
retain the old ones resulting in better customer management.
Originality/value This paper uses novel features to identify sarcasm in online text which is one of the
most challenging problems in natural language processing. To the authorsknowledge, this is the first study
on sarcasm detection from a customer management perspective.
Keywords Text mining, Natural language processing, Artificial intelligence, Data mining,
Business intelligence, Sarcasm detection
Paper type Research paper
1. Introduction
With the arrival of the information age, social media has become one of the most powerful
tools for businesses to identify customer attitudes toward their products and services.
Modern businesses are becoming increasingly dependent on the online medium to attract
customers and reduce the customer churn rate (Kacen et al., 2013). Various data mining
techniques are applied by organizations to understand customer preferences and opinions.
One of the most popular techniques for analyzing online data is sentiment analysis.
In sentiment analysis, opinions can be classified into positive, neutral or negative (Pang and
Lee, 2008). Sentiment analysis about products can help in determining customer preferences
and dislikes. In spite of its success, under certain circumstances sentiment analysis can be
gravely inadequate. One such situation is when the sentences are laden with sarcasm,
for example, sarcastic user tweets. It is quite possible that a sarcastic tweet, which
mockingly praises a product while actually deriding it, be classified as positive customer
emotion. Sarcasm, being a special type of communication, where the explicit meaning differs
from the implicit one, cannot be effectively identified with conventional data mining
techniques such as sentiment analysis (Yee Liau and Pei Tan, 2014).
Macmillan English Dictionary defines sarcasm as the activity of saying or writing the
opposite of what one means or of saying in a way intended to make someone else feel stupid
or show them that one is angry (Rundell and Fox, 2002). With sophistication of language, the
use of sarcasm in verbal and written text has become the norm. However, automatic
detection of sarcasm is still in its infancy. The ambiguous nature of sarcasm makes it
Industrial Management & Data
Systems
Vol. 117 No. 6, 2017
pp. 1109-1126
© Emerald PublishingLimited
0263-5577
DOI 10.1108/IMDS-06-2016-0207
Received 11 June 2016
Revised 5 October 2016
Accepted 15 November 2016
The current issue and full text archive of this journal is available on Emerald Insight at:
www.emeraldinsight.com/0263-5577.htm
1109
Detecting
sarcasm in
customer
tweets
difficult even for humans to detect it in sentences. Despite the difficulties, the huge
benefit of detecting sarcasm has been recognized in many computer interaction-based
applications, such as review summarization, dialogue systems and review ranking systems
(Davidov et al., 2010). From a business perspective, detecting sarcasm can be crucial in
correctly categorizing customer opinions about products, services and social issues, all of
which suffer from a high threat of being incorrectly categorized.
This makes sarcasm detection from unstructured text data a relevant and challenging
problem. This is also because it is unaided by any visual or vocal cues that assist humans in
understanding sarcasm. One of the major issues in sarcasm detection is the absence of
naturally occurring expressions that can be used for training purposes (Davidov et al., 2010).
In the case of microblogs, such as Twitter, messages can be annotated with hashtags that
are an indication of the sentiment being expressed in tweets. These hashtags are reliable
indicators of the emotion being expressed by the tweets, as the author explicitly conveys the
emotion of the tweet through them (e.g. #happy, #joy, #sad). We utilized this behavior to
formulate hashtags (#sarcasm, #sarcastic) for our data set. We considered the sentences
that end in #sarcasm or #sarcastic to be the gold standard for sarcastic sentences. We did
supervised learning, using Naïve Bayes and maximum entropy classifier to differentiate
between a sarcastic and a non-sarcastic tweet. In our knowledge, this is the first attempt to
study and understand sarcasm from a customer management perspective.
We trained our classifiers on multiple different feature types. The feature set, we
emphasize on, consists of function words, part of speech tags, part of speech n-grams and
their various combinations. At first, we have used topic as well as writing style-based
features to classify the tweets for sarcasm detection. We did not come across any work in
sarcasm detection literature which has tried to capture authorial style-based features.
Our work thus adds a new dimension to natural language processing (NLP)-based research
on sarcasm detection.
An English sentence can be broadly said to consist of two types of words function
words and content words. Function words are the words that have little or no significant
meaning outside the premise of the sentence. On the other hand, content words are the
words that have meaning even outside the context of the sentence (William Collins Sons &
Co. Ltd, 2009). Examples of function words are the, and, he, not, etc. Examples of content
words are school, dog, angry, etc. If we were to consider an English sentence in its entirety,
it would consist of these two categories of words.
Extant literature states that the authorial or writing style is best captured by the
function words and the part of speech used in the sentence (Argamon et al., 2003).
Koppel (2002) state that categorization by topic is typically based on keywords that reflect
adocuments content whereas categorization by author style uses precisely those
features that are independent of context. Authorial style-based classification has been
applied successfully in gender classification of regular text as well as microblogs
(Argamon et al., 2003; Mukherjee and Bala, 2016). We propose that the content of the
tweets as well as the authorial or writing style both contribute to the sarcasm present in
the tweets. We have used features that are independent of the content of the text in
conjunction with other topic or content-based features. By content-based features, we
mean those features which are an integral part of the text and give the text its meaning.
For example, if we consider a tweet Amazon, love your customer service, really amazing!
then the content words are –“Amazon,”“love,”“customer,”“serviceand amazing.
The rest of the words –“yourand really,are function words or writing style-based
features that can vary from author to author.
We hypothesize that sarcasm in a sentence is dependent not only on the content words of
the sentence but also on the authorial or writing style of the author, which are best depicted
by function words and parts of speech of the sentences (Argamon et al., 2003; Koppel, 2002).
1110
IMDS
117,6

To continue reading

Request your trial

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT