Detecting sarcasm in customer tweets: an NLP based approach

Document

Cited in

Date	10 July 2017
Pages	1109-1126
DOI	https://doi.org/10.1108/IMDS-06-2016-0207
Published date	10 July 2017
Author	Shubhadeep Mukherjee,Pradip Kumar Bala
Subject Matter	Information & knowledge management,Information systems,Data management systems,Knowledge management,Knowledge sharing,Management science & operations,Supply chain management,Supply chain information systems,Logistics,Quality management/systems

Detecting sarcasm in customer

tweets: an NLP based approach

Shubhadeep Mukherjee

Department of Information Systems,

Indian Institute of Management Ranchi, Ranchi, India, and

Pradip Kumar Bala

Information Systems, IIM Ranchi, India

Abstract

Purpose –The purpose of this paper is to study sarcasm in online text –specifically on twitter –to better

understand customer opinions about social issues, products, services, etc. This can be immensely helpful in

reducing incorrect classification of consumer sentiment toward issues, products and services.

Design/methodology/approach –In this study, 5,000 tweets were downloaded and analyzed. Relevant

features were extracted and supervised learning algorithms were applied to identify the best differentiating

features between a sarcastic and non-sarcastic sentence.

Findings –The results using two different classification algorithms, namely, Naïve Bayes and maximum

entropy show that function words and content words together are most effective in identifying sarcasm in

tweets. The most differentiating features between a sarcastic and a non-sarcastic tweet were identified.

Practical implications –Understanding the use of sarcasm in tweets let companies do better sentiment

analysis and product recommendations for users. This could help businesses attract new customers and

retain the old ones resulting in better customer management.

Originality/value –This paper uses novel features to identify sarcasm in online text which is one of the

most challenging problems in natural language processing. To the authors’knowledge, this is the first study

on sarcasm detection from a customer management perspective.

Keywords Text mining, Natural language processing, Artificial intelligence, Data mining,

Business intelligence, Sarcasm detection

Paper type Research paper

1. Introduction

With the arrival of the information age, social media has become one of the most powerful

tools for businesses to identify customer attitudes toward their products and services.

Modern businesses are becoming increasingly dependent on the online medium to attract

customers and reduce the customer churn rate (Kacen et al., 2013). Various data mining

techniques are applied by organizations to understand customer preferences and opinions.

One of the most popular techniques for analyzing online data is sentiment analysis.

In sentiment analysis, opinions can be classified into positive, neutral or negative (Pang and

Lee, 2008). Sentiment analysis about products can help in determining customer preferences

and dislikes. In spite of its success, under certain circumstances sentiment analysis can be

gravely inadequate. One such situation is when the sentences are laden with sarcasm,

for example, sarcastic user tweets. It is quite possible that a sarcastic tweet, which

mockingly praises a product while actually deriding it, be classified as positive customer

emotion. Sarcasm, being a special type of communication, where the explicit meaning differs

from the implicit one, cannot be effectively identified with conventional data mining

techniques such as sentiment analysis (Yee Liau and Pei Tan, 2014).

Macmillan English Dictionary defines sarcasm as the activity of saying or writing the

opposite of what one means or of saying in a way intended to make someone else feel stupid

or show them that one is angry (Rundell and Fox, 2002). With sophistication of language, the

use of sarcasm in verbal and written text has become the norm. However, automatic

detection of sarcasm is still in its infancy. The ambiguous nature of sarcasm makes it

Industrial Management & Data

Systems

Vol. 117 No. 6, 2017

pp. 1109-1126

0263-5577

DOI 10.1108/IMDS-06-2016-0207

Received 11 June 2016

Revised 5 October 2016

Accepted 15 November 2016

The current issue and full text archive of this journal is available on Emerald Insight at:

www.emeraldinsight.com/0263-5577.htm

1109

Detecting

sarcasm in

customer

tweets

difficult even for humans to detect it in sentences. Despite the difficulties, the huge

benefit of detecting sarcasm has been recognized in many computer interaction-based

applications, such as review summarization, dialogue systems and review ranking systems

(Davidov et al., 2010). From a business perspective, detecting sarcasm can be crucial in

correctly categorizing customer opinions about products, services and social issues, all of

which suffer from a high threat of being incorrectly categorized.

This makes sarcasm detection from unstructured text data a relevant and challenging

problem. This is also because it is unaided by any visual or vocal cues that assist humans in

understanding sarcasm. One of the major issues in sarcasm detection is the absence of

naturally occurring expressions that can be used for training purposes (Davidov et al., 2010).

In the case of microblogs, such as Twitter, messages can be annotated with hashtags that

are an indication of the sentiment being expressed in tweets. These hashtags are reliable

indicators of the emotion being expressed by the tweets, as the author explicitly conveys the

emotion of the tweet through them (e.g. #happy, #joy, #sad). We utilized this behavior to

formulate hashtags (#sarcasm, #sarcastic) for our data set. We considered the sentences

that end in #sarcasm or #sarcastic to be the gold standard for sarcastic sentences. We did

supervised learning, using Naïve Bayes and maximum entropy classifier to differentiate

between a sarcastic and a non-sarcastic tweet. In our knowledge, this is the first attempt to

study and understand sarcasm from a customer management perspective.

We trained our classifiers on multiple different feature types. The feature set, we

emphasize on, consists of function words, part of speech tags, part of speech n-grams and

their various combinations. At first, we have used topic as well as writing style-based

features to classify the tweets for sarcasm detection. We did not come across any work in

sarcasm detection literature which has tried to capture authorial style-based features.

Our work thus adds a new dimension to natural language processing (NLP)-based research

on sarcasm detection.

An English sentence can be broadly said to consist of two types of words –function

words and content words. Function words are the words that have little or no significant

meaning outside the premise of the sentence. On the other hand, content words are the

words that have meaning even outside the context of the sentence (William Collins Sons &

Co. Ltd, 2009). Examples of function words are –the, and, he, not, etc. Examples of content

words are –school, dog, angry, etc. If we were to consider an English sentence in its entirety,

it would consist of these two categories of words.

Extant literature states that the authorial or writing style is best captured by the

function words and the part of speech used in the sentence (Argamon et al., 2003).

Koppel (2002) state that categorization by topic is typically based on keywords that reflect

adocument’s content whereas categorization by author style uses precisely those

features that are independent of context. Authorial style-based classification has been

applied successfully in gender classification of regular text as well as microblogs

(Argamon et al., 2003; Mukherjee and Bala, 2016). We propose that the content of the

tweets as well as the authorial or writing style both contribute to the sarcasm present in

the tweets. We have used features that are independent of the content of the text in

conjunction with other topic or content-based features. By content-based features, we

mean those features which are an integral part of the text and give the text its meaning.

For example, if we consider a tweet “Amazon, love your customer service, really amazing!”

then the content words are –“Amazon,”“love,”“customer,”“service”and “amazing.”

The rest of the words –“your”and “really,”are function words or writing style-based

features that can vary from author to author.

We hypothesize that sarcasm in a sentence is dependent not only on the content words of

the sentence but also on the authorial or writing style of the author, which are best depicted

by function words and parts of speech of the sentences (Argamon et al., 2003; Koppel, 2002).

1110

IMDS

117,6

To continue reading

Request your trial

Subscribers can access the reported version of this case.

You can sign up for a trial and make the most of our service including these benefits.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Subscribers are able to see a list of all the cited cases and legislation of a document.

You can sign up for a trial and make the most of our service including these benefits.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Subscribers are able to see a list of all the documents that have cited the case.

You can sign up for a trial and make the most of our service including these benefits.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Subscribers are able to see the revised versions of legislation with amendments.

You can sign up for a trial and make the most of our service including these benefits.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Subscribers are able to see any amendments made to the case.

You can sign up for a trial and make the most of our service including these benefits.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Subscribers are able to see a visualisation of a case and its relationships to other cases. An alternative to lists of cases, the Precedent Map makes it easier to establish which ones may be of most relevance to your research and prioritise further reading. You also get a useful overview of how the case was received.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Subscribers are able to see the list of results connected to your document through the topics and citations Vincent found.

You can sign up for a trial and make the most of our service including these benefits.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Detecting sarcasm in customer tweets: an NLP based approach

You can sign up for a trial and make the most of our service including these benefits.

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users

You can sign up for a trial and make the most of our service including these benefits.

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users

You can sign up for a trial and make the most of our service including these benefits.

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users

You can sign up for a trial and make the most of our service including these benefits.

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users

You can sign up for a trial and make the most of our service including these benefits.

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users

You can sign up for a trial and make the most of our service including these benefits.

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users