Stylometric analysis of classical Arabic texts for genre detection

Date01 October 2018
DOIhttps://doi.org/10.1108/EL-11-2017-0236
Pages842-855
Published date01 October 2018
AuthorMaha Al-Yahya
Stylometric analysis of classical
Arabic texts for genre detection
Maha Al-Yahya
Department of Information Technology, King Saud University, Riyadh,
Saudi Arabia
Abstract
Purpose In the context of informationretrieval, text genre is as important as its content, and knowledgeof
the text genre enhances the search engine features by providing customized retrieval. The purpose of this
study is to explore and evaluate the use of stylometric analysis, a quantitative analysis for the linguistics
featuresof text, to support the task of automated text genre detectionfor Classical Arabic text.
Design/methodology/approach Unsupervised clustering and supervisedclassif‌ication were applied
on the King Saud University CorpusofClassicalArabic texts (KSUCCA) using the most frequent words in the
corpus (MFWs) as stylometric features.Four popular distance measures established in stylometricresearch
are evaluatedfor the genre detection task.
Findings The results of the experimentsshow that stylometry-based genre clustering and classif‌ication
align well withhuman-def‌ined genre. The evidence suggests thatgenre style signals exist for Classical Arabic
and can be usedto support the task of automated genre detection.
Originality/value This work targets the task of genre detection in Classical Arabic text using
stylometricfeatures, an approach that has only been previously applied to Arabic authorshipattribution. The
study also provides a comparisonof four distance measures used in stylomtreic analysis on the KSUCCA, a
corpus withover 50 million words of Classical Arabic usingclustering and classif‌ication.
Keywords Stylometric analysis, Genre detection, Classical arabic text, Distance measure
Paper type Research paper
1. Introduction
Stylometry is a measure of language style. It is def‌ined as the statistical analysis of
variations in literary style between one writeror genre and another(OED, 2017). The term
was originally coined by Lutoslawski in 1896 (Lauer and Jannidis, 2014;Pawlowski and
Pacewicz, 2004), and the approach hasbecome popular for research on authorship
attribution (Holmes and Kardos,2003;Juola, 2006). Stylometry, however, can also be applied
to other problems in text analysis including forensic linguistics (Afroz et al.,2012;Rocha
et al., 2017), plagiarism detection (Ramnial et al.,2016), chronology studies observing the
developing voice of an authorover a period of years (Juola, 2007), stylistic inconsistenciesin
collaborative writing (Glover and Hirst, 1995), literary inf‌luence (Jockers, 2013) and genre
detection (Jockers,2013).
Genre is def‌ined as a type of communication which is denoted by a socially accepted
purpose and a common form (Yates and Orlikowski, 1992). Genresare useful, as they make
documents easy to understand, thus reducing mental effort(Crowston and Kwasnik, 2003).
In the context of the organization of information and informationretrieval, document genre
is as important as the content of the document, and knowledge of document genre enables
the enhancement of searchengine capabilities by providing customized retrieval.
Genre detection is an important task for knowledge organization and retrieval
(Andersen, 2008), and it aims to group and organize texts based on def‌ined similarities
EL
36,5
842
Received11 November 2017
Revised4 March 2018
4 May2018
Accepted7 May 2018
TheElectronic Library
Vol.36 No. 5, 2018
pp. 842-855
© Emerald Publishing Limited
0264-0473
DOI 10.1108/EL-11-2017-0236
The current issue and full text archive of this journal is available on Emerald Insight at:
www.emeraldinsight.com/0264-0473.htm

Get this document and AI-powered insights with a free trial of vLex and Vincent AI

Get Started for Free

Start Your Free Trial of vLex and Vincent AI, Your Precision-Engineered Legal Assistant

  • Access comprehensive legal content with no limitations across vLex's unparalleled global legal database

  • Build stronger arguments with verified citations and CERT citator that tracks case history and precedential strength

  • Transform your legal research from hours to minutes with Vincent AI's intelligent search and analysis capabilities

  • Elevate your practice by focusing your expertise where it matters most while Vincent handles the heavy lifting

vLex

Start Your Free Trial of vLex and Vincent AI, Your Precision-Engineered Legal Assistant

  • Access comprehensive legal content with no limitations across vLex's unparalleled global legal database

  • Build stronger arguments with verified citations and CERT citator that tracks case history and precedential strength

  • Transform your legal research from hours to minutes with Vincent AI's intelligent search and analysis capabilities

  • Elevate your practice by focusing your expertise where it matters most while Vincent handles the heavy lifting

vLex

Start Your Free Trial of vLex and Vincent AI, Your Precision-Engineered Legal Assistant

  • Access comprehensive legal content with no limitations across vLex's unparalleled global legal database

  • Build stronger arguments with verified citations and CERT citator that tracks case history and precedential strength

  • Transform your legal research from hours to minutes with Vincent AI's intelligent search and analysis capabilities

  • Elevate your practice by focusing your expertise where it matters most while Vincent handles the heavy lifting

vLex

Start Your Free Trial of vLex and Vincent AI, Your Precision-Engineered Legal Assistant

  • Access comprehensive legal content with no limitations across vLex's unparalleled global legal database

  • Build stronger arguments with verified citations and CERT citator that tracks case history and precedential strength

  • Transform your legal research from hours to minutes with Vincent AI's intelligent search and analysis capabilities

  • Elevate your practice by focusing your expertise where it matters most while Vincent handles the heavy lifting

vLex

Start Your Free Trial of vLex and Vincent AI, Your Precision-Engineered Legal Assistant

  • Access comprehensive legal content with no limitations across vLex's unparalleled global legal database

  • Build stronger arguments with verified citations and CERT citator that tracks case history and precedential strength

  • Transform your legal research from hours to minutes with Vincent AI's intelligent search and analysis capabilities

  • Elevate your practice by focusing your expertise where it matters most while Vincent handles the heavy lifting

vLex

Start Your Free Trial of vLex and Vincent AI, Your Precision-Engineered Legal Assistant

  • Access comprehensive legal content with no limitations across vLex's unparalleled global legal database

  • Build stronger arguments with verified citations and CERT citator that tracks case history and precedential strength

  • Transform your legal research from hours to minutes with Vincent AI's intelligent search and analysis capabilities

  • Elevate your practice by focusing your expertise where it matters most while Vincent handles the heavy lifting

vLex