Gender bias in sentiment analysis

Date12 February 2018
Published date12 February 2018
Pages45-57
DOIhttps://doi.org/10.1108/OIR-05-2017-0139
AuthorMike Thelwall
Subject MatterLibrary & information science,Information behaviour & retrieval,Collection building & management,Bibliometrics,Databases,Information & knowledge management,Information & communications technology,Internet,Records management & preservation,Document management
Gender bias in sentiment analysis
Mike Thelwall
School of Mathematics and Computer Science, University of Wolverhampton,
Wolverhampton, UK
Abstract
Purpose The purpose of this paper is to test if there are biases in lexical sentiment analysis accuracy
between reviews authored by males and females.
Design/methodology/approach This paper uses data sets of TripAdvisor reviews of hotels and restaurants
in the UK written by UK residents to contrast the accuracy of lexical sentiment analysis for males and females.
Findings Male sentiment is harder to detect because it is less explicit. There was no evidence that this
problem could be solved by gender-specific lexical sentiment analysis.
Research limitations/implications Only one lexical sentiment analysis algorithm was used.
Practical implications Care should be taken when drawing conclusions about gender differences from
automatic sentiment analysis results. When comparing opinions for product aspects that appeal differently to
men and women, female sentiments are likely to be overrepresented, biasing the results.
Originality/value This is the first evidence that lexical sentiment analysis is less able to detect the
opinions of one gender than another.
Keywords Social media, Sentiment analysis, Opinion mining, Online customer relations management
Paper type Research paper
Introduction
Sentiment analysis is the computer-based estimation of the sentiment expressed in text, such
as its overall polarity, the range of emotions expressed, or thestrengths of any opinions. It is
widely used within marketing and customer relations management through the online
monitoring of customer opinions towards products and services expressed insocial media or
review sites (Pekar and Ou, 2008; Schweidel and Moe, 2014; Tirunillai and Tellis, 2014).
Because of this, socialmedia monitoring has been a standard businesstechnique for over half
a decade (Hofer-Shall, 2010). It takes advantage of the public availability of opinionatedtexts
and fast, accurate software for detecting opinions. It can also be used to assessthe impact of
business interventions in the social web (Homburg et al., 2015).
Sentiment analysis is typically used as a black box solution by marketers who see the
results of the algorithm used to classify sentiment but are not interested in its details.
For example, they may find that 45 per cent of comments about product A are positive in
comparison to 25 per cent for product B, concluding that product A is more favourably
viewed. This may be misleading if there is bias in the data or the sentiment analysis
algorithm. If product Bs admirers are older and less likely to post to the social web then
their opinions would be underrepresented. The importance of representativeness is
recognised for survey-based research (e.g. Gronholdt et al., 2000) and it is equally important
for big data analyses. Moreover, no automatic sentiment analysis system is perfect and it is
possible that the sentiment in posts about product A are harder to detect, introducing a
hidden (to the marketer) source of bias. This bias may occur if the positive aspects of
product A are difficult to describe explicitly or if its admirers are from a group that express
sentiment less directly, the case that is considered here for gender. For example, a sentiment-
based comparison of smartphones (Kim et al., 2016) might give gender biased results if the
system is better at identifying sentiment from one gender than from another, and a system
that detects sentiment to help select good ideas (Lee and Suh, 2016) might have a bias
towards the opinions of one gender.
On Twitter, some emotion-related terms, such as love and haha, are disproportionately
used by one gender. The same is true for some other words and linguistic features, such as
Online Information Review
Vol. 42 No. 1, 2018
pp. 45-57
© Emerald PublishingLimited
1468-4527
DOI 10.1108/OIR-05-2017-0139
Received 2 May 2017
Revised 8 July 2017
Accepted 23 August 2017
The current issue and full text archive of this journal is available on Emerald Insight at:
www.emeraldinsight.com/1468-4527.htm
45
Gender bias
in sentiment
analysis

To continue reading

Request your trial

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT