A data-driven neural network architecture for sentiment analysis

Publication Date04 February 2019
Pages2-19
Date04 February 2019
DOIhttps://doi.org/10.1108/DTA-03-2018-0017
AuthorErion Çano,Maurizio Morisio
SubjectLibrary & information science,Librarianship/library management,Library technology,Information behaviour & retrieval,Metadata,Information & knowledge management,Information & communications technology,Internet
A data-driven neural
network architecture
for sentiment analysis
Erion Çano and Maurizio Morisio
Dipartimento di Automatica e Informatica, Politecnico di Torino, Torino, Italy
Abstract
Purpose The fabulous results of convolution neural networks in image-related tasks attracted attention of
text mining, sentiment analysis and other text analysis researchers. It is, however, difficult to find enough
data for feeding such networks, optimize their parameters, and make the right design choices when
constructing network architectures. The purpose of this paper is to present the creation steps of two big data
sets of song emotions. The authors also explore usage of convolution and max-pooling neural layers on song
lyrics, product and movie review text data sets. Three variants of a simple and flexible neural network
architecture are also compared.
Design/methodology/approach The intention was to spot any important patterns that can serve as
guidelines for parameter optimization of similar models. The authors also wanted to identify architecture
design choices which lead to high performing sentiment analysis models. To this end, the authors conducted a
series of experiments with neural architectures of various configurations.
Findings The results indicate that parallel convolutions of filter lengths up to 3 are usually enough for
capturing relevant text features. Also, max-pooling region size should be adapted to the length of text
documents for producing the best feature maps.
Originality/value Top results the authors got are obtained with feature maps of lengths 618. An
improvement on future neural network models for sentiment analysis could be generating sentiment polarity
prediction of documents using aggregation of predictions on smaller excerpt of the entire text.
Keywords Sentiment analysis, Opinion mining, Convolution neural networks,
Deep learning architectures, Text data set properties, Word embeddings
Paper type Research paper
1. Introduction
Neural networks are providing ground-breaking results in many complex tasks such as
object detection, speech recognition or sentiment analysis. This success is usually attributed
to the ability of deep neural networks to generalize well when trained with high quantities of
data. Among the various neural network types, convolution neural networks (CNN) have
become particularly popular because of their ability to mimic the functionality of the human
brain visual cortex. As a result, CNNs are applied in image-related tasks such as object
detection, fingerprint recognition, computer vision, etc. The basic structure of a CNN was
first applied by LeCun et al. (1998) for recognizing images of handwritten digits. A decade of
hibernation (known as the second AI winter) passed and they showed back in the late 2000s
rebranded as Deep Learning. At this time, they also became essential part of various
proposed architectures such as Inception in Szegedy et al. (2016), ResNet in He et al. (2015)
and more. Many natural language processing researchers explored use of CNNs or recurrent
neural networks (RNN) for text mining tasks such as sentiment analysis, reporting excellent
results with little computation load. However, neural network models are usually data
hungry and require bigger data sets of training samples. The other problem is the difficulty
in finding the optimal hyper-parameter setup or design choices when using various types of
networks. Optimal network configuration depends on characteristics of available data
which should be taken into account.
We present in this paper the work we conducted for constructing two relatively big data
sets of emotionally labeled songs and the results of many experiments with text data sets of
Data Technologies and
Applications
Vol. 53 No. 1, 2019
pp. 2-19
© Emerald PublishingLimited
2514-9288
DOI 10.1108/DTA-03-2018-0017
Received 15 March 2018
Revised 14 September 2018
4 October 2018
Accepted 5 October 2018
The current issue and full text archive of this journal is available on Emerald Insight at:
www.emeraldinsight.com/2514-9288.htm
2
DTA
53,1
different size and document lengths for simplifying neural network construction. For
emotional labeling of songs, we utilized social tags crawled from Last.fm music portal. We
also adapted a model of music emotions that is highly compatible with the popular model of
Russell, together with an annotation scheme based on emotion tags each song has received.
Furthermore, the works in Çano and Morisio (2018b) are extended both quantitatively and
qualitatively. The first introduces three variants of a neural network architecture that uses
convolution and max-pooling layers for text feature extraction and selection as well as a
regularized feed-forward layer for classification. In the second paper, various relations
between data properties and neural network parameters with respect to optimal
performance are explored. In this work we report accuracy scores of a higher number of
experiments with more data sets (e.g. including short sentences). Our results can help
researchers to simplify hyper-parameter optimization of neural networks that are used for
sentiment analysis experiments.
A fact, we noticed, is that bigger data sets are better interpreted by repeating several
stacks of parallel convolutions followed by max-pooling layers. An interesting regularity is
the one that relates length of documents with pooling region size. The later is the parameter
that dictates the size of produced feature maps. According to our results, top scores are
achieved when pooling region size is set to produce feature maps that are 618 units long.
Also, convolutions with filter lengths 1, 2 and 3 are usually enough. Utilizing convolutions of
longer filters did not improve results. Regarding the three neural network design we
proposed, the basic version with max-pooling layers directly following each convolution
layer resulted the best one. The flexibility it offers and its low training time make it a good
option as a prototyping basis for practitioners.
The rest of the paper is structured as follows: Section 2 presents an overview of various
neural network models recently used in text mining tasks. Section 3 describes the steps that
were followed for the construction of the two music emotion data sets. Section 4 presents
preprocessing steps, utilized data sets and obtained network parameter optimization results.
In Section 5, we describe the three network architectures we propose. Section 6 presents the
high-level architectural parameters and decisions, together with the literature baselines we
compare against. Section 7 discusses obtained results and finally, Section 8 concludes.
2. Background
Distributed representations of words known as wordembeddings are becoming widely used
in text analysis tasks. Their reduced dimensionality makes them well-suited for integration
with neural networks. Bag-of-words (BOW), on the other hand, creates a high dimensional
space where every word appearing in documents is treated as a feature dimension. Suppose,
for example, that we have a vector of ddimensions for every word appearing in our
documents. We can thus represent each document(set of words) as a vector of word vectors
(matrix). This is very similar to the matrix of pixels which represents an image. In case we
have d¼4 (in practice100400 dimensions are used),the representation of movie reviewthat
movie was greatwill be as shown in Figure 1. Because of this similarity, same image
recognition techniques like CNN can be successfully applied on text analysis as well. This
started in late 2000s, with pioneering work conducted by Collobert and Weston (2008) and
Collobert et al. (2011), applying them on tasks like part-of-speech tagging or named entity
recognition.Kim (2014) was one of the first to apply CNNs for sentimentpolarity prediction of
texts. He used a basic CNN to extract and select features from sentences, reporting excellent
performance withlittle computation load. Many studiesthat followed explored deeper neural
networks of convolution layers. Inthe study of Zhang et al. (2015a) and Schwenket al. (2017),
for example, authors used 929 layers, building text pattern features from the basic
characters. They reported excellent results on large text data sets of millions of documents.
On small data sets, however, the deep networks theybuilt are weaker than shallow networks
3
Data-driven
neural network
architecture

To continue reading

Request your trial

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT