Evaluating transfer learning approach for detecting Arabic anti-refugee/migrant speech on social media

DOIhttps://doi.org/10.1108/AJIM-10-2021-0293
Published date22 March 2022
Date22 March 2022
Pages1070-1088
Subject MatterLibrary & information science,Information behaviour & retrieval,Information & knowledge management,Information management & governance,Information management
AuthorDjamila Mohdeb,Meriem Laifa,Fayssal Zerargui,Omar Benzaoui
Evaluating transfer learning
approach for detecting Arabic
anti-refugee/migrant speech
on social media
Djamila Mohdeb
University of Bordj Bou Arreridj, Bordj Bou Arreridj, Algeria
Meriem Laifa
University of Bordj Bou Arreridj, Bordj Bou Arreridj, Algeria and
Laboratory of Informatics and its Applications of Msila (LIAM), Msila, Algeria, and
Fayssal Zerargui and Omar Benzaoui
University of Bordj Bou Arreridj, Bordj Bou Arreridj, Algeria
Abstract
Purpose The present study was designed to investigate eight research questions that are related to the
analysisand the detection of dialectal Arabic hate speech that targeted African refugees and illegal migrants on
the YouTube Algerian space.
Design/methodology/approach The transfer learning approach which recently presents the state-of-the-art
approach in natural language processing tasks has been exploited to classify and detect hate speech in Algerian
dialectal Arabic. Besides, a descriptive analysis has been conducted to answer the analytical research questions that
aim at measuringand evaluating the presence of the anti-refugee/migrant di scourseon the YouTube social platform.
Findings Data analysis revealedthat there has beena gradual modest increasein the number of anti-refugee/
migranthatefulcommentson YouTube since2014, a sharp risein 2017 and a sharpdecline in lateryears until2021.
Furthermore, our findings stemming from classifying hate content using multilingual and monolingual pre-trained
language transformers demonstrate a good performance of the AraBERT monolingual transformer in comparison
with the monodialectal transformer DziriBERT and the cross-lingual transformers mBERT and XLM-R.
Originality/value Automatic hate speech detection in languages other than English is quite a challenging
task that the literature has tried to address by various approaches of machine learning. Although the recent
approach of cross-lingual transfer learning offers a promising solution, tackling this problem in the context of
the Arabic language, particularly dialectal Arabic makes it even more challenging. Our results cast a new light
on the actual ability of the transfer learning approach to deal with low-resource languages that widely differ
from high-resource languages as well as other Latin-based, low-resource languages.
Keywords Hate speech, Anti-migrant speech, Algerian dialectal Arabic, African migrants, Transfer learning,
Arabic natural language processing
Paper type Research paper
1. Introduction
The rise of digitalmedia supported by social networksand citizen journalism has changedthe
way audiencesreceive and react to informationthat is related tothe concerns of public interest.
Onlinesocial media platforms are no longerused for only communicationand socialization, but
rather the most important tools for influencing and shaping public opinion, the role that the
traditional massmedia such as newspapers and television have monopolized for decades.
The content produced by social media users has serious issues that are mainly related to
the information credibility as well as the difficulty to control or influence the orientations of
the content creators who enjoy a high degree of freedom as they are not subject to the
restrictions imposed by traditional media institutions. That should bring us to the deviations
caused by the misuse of freedom of expression in the online social space in which the spread
of hate speech is its darkest face.
AJIM
74,6
1070
The current issue and full text archive of this journal is available on Emerald Insight at:
https://www.emerald.com/insight/2050-3806.htm
Received 9 October 2021
Revised 29 January 2022
4 March 2022
Accepted 6 March 2022
Aslib Journal of Information
Management
Vol. 74 No. 6, 2022
pp. 1070-1088
© Emerald Publishing Limited
2050-3806
DOI 10.1108/AJIM-10-2021-0293
Hate speech is a contested term that has a variety of definitions. We find the more
comprehensivedefinition suggested by Fortunaand Nunes (2018) who explained hate speech
as the languagethat attacks or diminishes,that incites violence or hate againstgroups, based
on specific characteristics such as physical appearance, religion, descent, national or ethnic
origin, sexual orientation, gender identity or other, and it can occur with different linguistic
styles, even in subtle formsor when humour is used.
Hate speech phenomenon is the undesirableoutcome of what is called in social sciencesthe
Otheringprocess thatcreates the norms of the UsThemdichotomy in a community (Burnap
and Williams,2015). This processis motivated to a certain extentby an exclusive characterthat
defines whobelongs to the in-group and accordingly who is partof the out-group (Burnap and
Williams, 2015). Otheringin a hateful discourse is based on the conscious or unconscious
assumption thata certain identified group posesa threat to the favored group(Powell,2017).
The members of theout-group are othered dependingon how noticeable their differences are
perceived as a threat in a specific context(Powell, 2017).
According to Gagliardone et al. (2015), the majority of studied cases of online hate speech
aim at individuals based on ethnicity and nationality. In this context, refugees and illegal
migrants are amongst the most vulnerable targeted categories as long as they are perceived
by host communities as a menace to their social, economic and cultural quality of life
(Reidpath and Allotey, 2018).
The migration crisis which is generated by major conflicts and natural disasters in Asia,
the Middle East and Africa is one of the most debated topics in the last decade on either mass
media or online social media (Reidpath and Allotey, 2018). An anti-refugee discourse has
overwhelmed the online social platforms triggering hate and fear towards refugees, migrants
and forced migrants, causing concern amongst scholars and policymakers. This hateful
rhetoric has marked online debates not only in Europe (Himmel and Baptista, 2020) but also in
many countries that were obligated to host large number of refugees in spite of its political
and economic instability (
Ozerim and Tolay, 2021).
Recent research pushed the question of online hate speech detection into new directions
using automatized methods and systems. Automatic hate speech detectors are mainly based
on machine learning and Natural Language Processing (NLP) techniques (Siegel, 2020).
Nonetheless, the effectiveness of these techniques turns out to be even more problematic due
to the controversy that surrounds the term hate speechand its relation with the freedom of
opinion and expression. On this basis, social media that use automatic hate speech detection
and removal are occasionally attacked and accused of bias and censorship during complex
situations such as political conflicts, e.g. IsraeliPalestinian conflict and Black Lives Matter
protests (Dwoskin and De Vynck, 2021), and elections, e.g. US presidential election 2020
(Clayton, 2021). Language dependency is another important reason for the restricted
performance of hate detection techniques when dealing with languages other than English,
especially non-Latin-based, low-resource languages. This could be justified from one side by
the relatively linguistic complexity of some of these languages and from the other side by the
lack of sufficient textual data resources for training and experiments (Pires et al., 2019).
Relying on transferring knowledge across domains, transfer learning is a promising
machine learning methodology for overcoming this latter challenge. To simplify, transfer
learning is a technique in which a model that has been pre-trained on a specific source task
can be re-utilized on a different but related target task. Thus, rather than collecting large
amount of data to improve target learnersperformance, a pre-trained model reuses the
knowledge acquired from being already trained with a large volume of other data on the
source task. This approach, with its broad application possibilities, can present practical
solutions to the problem of automatic hateful content detection in low-resource languages.
In this paper,we investigate the transfer learning approachfor automatically detecting the
hateful aspect of the anti-refugee/migrant Arabic speech on social media using the pre-trained
Detecting
Arabic
anti-refugee/
migrant speech
1071

To continue reading

Request your trial

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT