Towards the creation of an emotion lexicon for microblogging

DOIhttps://doi.org/10.1108/JSIT-06-2017-0040
Date14 May 2018
Pages130-151
Published date14 May 2018
AuthorGeorgios Kalamatianos,Symeon Symeonidis,Dimitrios Mallis,Avi Arampatzis
Subject MatterInformation & knowledge management,Information systems,Information & communications technology
Towards the creation of an
emotion lexicon for microblogging
Georgios Kalamatianos,Symeon Symeonidis,Dimitrios Mallis and
Avi Arampatzis
Department of Electrical and Computer Engineering,
Democritus University of Thrace, Xanthi, Greece
Abstract
Purpose The rapid growth of socialmedia has rendered opinion and sentiment mining an importantarea
of research with a wide range of applications. This paper aims to focus on the Greek language and the
microbloggingplatform Twitter, investigating methods forextracting emotion of individual tweets as wellas
populationemotion for different subjects (hashtags).
Design/methodology/approach The authors propose and investigate the use of emotion lexicon-
based methods as a mean of extracting emotion/sentiment information from social media. The authors
compare several approaches for measuring the intensity of six emotions: anger, disgust, fear, happiness,
sadness and surprise. To evaluatethe effectiveness of the methods, the authors develop a benchmark dataset
of tweets, manuallyrated by two humans.
Findings Development of a new sentimentlexicon for use in Web applications. The authors then assess
the performanceof the methods with the new lexicon and nd improved results.
Research limitations/implications Automated emotion results of research seem promising and
correlateto real user emotion. At this point, the authors make some interesting observationsabout the lexicon-
based approachwhich lead to the need for a new, better, emotion lexicon.
Practical implications The authors examine the variationof emotion intensity over time for selected
hashtagsand associate it with real-world events.
Originality/value The originality in this research is the development of a training set of tweets,
manually annotated by two independent raters. The authors transferthe sentiment information of these
annotatedtweets, in a meaningful way, to the set of words that appear in them.
Keywords Social media, Sentiment mining, Emotion lexicon
Paper type Research paper
1. Introduction
The disposition of users toward topicsof interest constitutes a valuable piece of information
that has social as well as nancial implications.The rapid increase in usage of social media
has rendered opinion and sentiment mining a promising area of research, as there is
growing interest in extracting information about what people think regarding various
products, services, public gures, political issues and many other things (Medhat et al.,
2014).
In the past, there has been a fair amount of research in the task of sentiment analysis on
data originating from product reviews, news articles, blogging, etc. (Liu and Zhang, 2012;
Hu and Liu, 2004). The microblogging platformTwitter is especially appropriate for opinion
mining and sentiment analysis, as it contains mostly textual information (very few other
media), which is publicly available, and is therefore popular in related research.
Additionally, the platforms international popularity allows researchers to investigate
mining methods for different languages (Giachanou and Crestani, 2016). However, the
JSIT
20,2
130
Received1 June 2017
Revised6 September 2017
2 February2018
Accepted10 April 2018
Journalof Systems and
InformationTechnology
Vol.20 No. 2, 2018
pp. 130-151
© Emerald Publishing Limited
1328-7265
DOI 10.1108/JSIT-06-2017-0040
The current issue and full text archive of this journal is available on Emerald Insight at:
www.emeraldinsight.com/1328-7265.htm
increasing popularity of microblogging has introduced new challenges in sentiment
analysis, related to the informal tone used by its users, the increased variety of subjects
referred to, the length constraints of the text, and the use of hashtags and emoticons
(Giachanou and Crestani,2016).
To our knowledge, the Greek language has not been examined sufciently in tasks
related to emotion analysis especially in relation to data from microblogging sources. This
seems to be mainly due to a shortage of appropriate datasets. Emotion-annotated data sets
in the Greek languagehave not yet been publicly available. An attempt to createappropriate
resources for emotion evaluationin the Greek language was made by Tsakalidis et al. (2014)
who created the rst Greek Sentiment Lexicon (GSL). We use this lexicon for our research
and improve upon it. Although Tsakalidis et al. (2014) have used the term Sentiment
Lexicon, we are inclined to use the term emotion lexicon;this seems to be more appropriate,
as we study six different emotionsrather than positive versus negative tweets.
Our goals and contributions in this paper are the following:
to create a benchmark data set with Greek tweets, along with a set of manually
rated tweets for their emotion intensity, and make it publicly available;
to develop automated methods for determining the emotion intensity of Greek
tweets, for the six following emotions: anger, disgust, fear, happiness, sadness and
surprise the proposed methods are based on Greek emotion lexica;
to develop automated methods for determining the emotion rating of different topics
(hashtags) in the six aforementioned emotion dimensions, based on individual tweet
emotion;
to develop a new improved emotion lexicon, specialized for the task of sentiment
analysis of Greek tweets that will enhance the performance of our tweet and
hashtag evaluation methods; and
to examine temporal aspects of emotions, such as changes in their intensity for
hashtags over time.
This paper builds upon and extends the previous work of Kalamatianoset al. (2015); we can
summarize the following contributions.We anonymize and make the benchmark Greek data
set publicly available[1]; it could constitute a valuable resource for future research. The
automated tweet emotion ratings are a direct result of calculations derived from the words
occurring in the tweet, without using classication algorithms. Similarly, the automated
hashtag emotion ratings are derived from the ratings of tweets where the hashtag occurs.
Thus, the proposed methods are efcient and fairly simple to implement, and they can be
used to provide baselineperformance for future experimentationwith the data set.
Also, as we show in the following sections,the emotion lexicon of Tsakalidis et al. (2014)
is not specialized for emotion evaluationof internet related data, as its entries present a low
matching rate to the terms appearedin the tweets. To deal with this issue, we developa new
emotion lexicon specicallyfor emotion analysis of Greek tweets that we will contributeas a
resource for tasks related to emotion analysisin the Greek language (also online at the URL
in previous pages Footnote). The new emotion lexicon, as we show later on, enhances the
performance of our methods, bringing the resultscloser to the human intuition. Finally, we
present an examination of temporal aspects for the emotion of happiness and anger and
associate it with events that provokedintense emotions to the Greek population.
Traditional construction of sentiment and emotion lexicons are based on machine
learning approaches where each term is represented with a binary label/polarity (Liu and
Zhang, 2012;Chen and Skiena, 2014). A recent work (Xuet al.,2013) has classied terms in
Creation of an
emotion
lexicon
131

To continue reading

Request your trial

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT