Exploring the effectiveness of word embedding based deep learning model for improving email classification

Document

Cited in

DOI	https://doi.org/10.1108/DTA-07-2021-0191
Published date	02 February 2022
Date	02 February 2022
Pages	483-505
Subject Matter	Library & information science,Librarianship/library management,Library technology,Information behaviour & retrieval,Metadata,Information & knowledge management,Information & communications technology,Internet
Author	Deepak Suresh Asudani,Naresh Kumar Nagwani,Pradeep Singh

Exploring the effectiveness of word

embedding based deep learning

model for improving

email classification

Deepak Suresh Asudani, Naresh Kumar Nagwani and Pradeep Singh

Department of Computer Science and Engineering,

National Institute of Technology Raipur, Raipur, India

Abstract

Purpose –Classifying emails as ham or spam based on their content is essential. Determining the semantic

and syntactic meaning of words and putting them in a high-dimensional feature vector form forprocessing is

the most difficult challenge in email categorization.The purpose of this paper is to examine the effectiveness of

the pre-trained embedding model for the classification of emails using deep learningclassifiers such as the long

short-term memory (LSTM) model and convolutional neural network (CNN) model.

Design/methodology/approach –In this paper, global vectors (GloVe) and Bidirectional Encoder

Representations Transformers (BERT) pre-trained word embedding are usedto identify relationships between

words, which helps to classify emails into their relevant categories using machine learning and deep learning

models. Two benchmark datasets, SpamAssassin and Enron, are used in the experimentation.

Findings –In the first set of experiments, machine learning classifiers, the support vector machine (SVM)

model, perform better than other machine learning methodologies.The second set of experiments compares the

deep learning model performance without embedding, GloVe and BERT embedding. The experiments show

that GloVe embedding can be helpful for faster execution with better performance on large-sized datasets.

Originality/value –The experiment reveals that the CNN modelwith GloVe embedding gives slightly better

accuracy than the model with BERT embedding and traditional machine learning algorithms to classify an

email as ham or spam. It is concluded that the word embedding models improve email classifiers accuracy.

Keywords Email classification, Machine learning, Word embedding, GloVe, BERT, Deep learning

Paper type Research paper

1. Introduction

An email has become a standard personal, and professional communication as a technology

has advanced. Email marketing helps expand businesses by addressing existing and new

consumers with updated offers and services. In the current pandemic climate, digital

communication demonstrates its value by facilitating communication between business

owners and customers.

As per the survey, the number of emails sent and received internationally has climbed each

year as the Internet has become more accessible. In 2020, more than 306.4 billion emails are sent

and received per day and are expected to rise to over 376.4 billion per day by 2025 (Statista,2021).

The shift from offline to online education, sales and support services has resulted in a rise in

cyber-attacks on students and employees working remotely due to the lack of security

safeguards. According to a Kaspersky report released on 1st March 2021, 45% of online users in

India are targeted by local risks in 2020, and this number is expected to rise in 2021

(Kaspersky, 2021).

The

effectiveness

of word

embedding

483

The authors thank the editor and the anonymous reviewers for their insightful and valuable comments

and suggestions. The authors gratefully acknowledge the National Institute of Technology, Raipur, for

providing the GPU server used for this research.

The current issue and full text archive of this journal is available on Emerald Insight at:

https://www.emerald.com/insight/2514-9288.htm

Received 24 July 2021

Revised 13 October 2021

25 December 2021

Accepted 8 January 2022

Data Technologies and

Applications

Vol. 56 No. 4, 2022

pp. 483-505

2514-9288

DOI 10.1108/DTA-07-2021-0191

With the increase in email communication, there is a hidden risk that hackers may try to

target victims, steal vital information or redirect them to a false website, significantly

impacting the user. Thus, classifying an email as ham or spam is critical, and efficient email

classification is required to properly categorize a massive volume of email into multiple

categories depending on its content within a specific time frame (Dada et al., 2019).

The objective of the presented work is to explore whether word embedding can help improve

the email classification task compared to the traditional machine learning approach.

The easy way to perform email classification is to create rules using specific keywords and

categories of new emails following the rules. The rule-based system fails to capture diverse

keywords and misclassifies email as ham instead of spam. Machine learning classifiers

effectively train the model to appropriately classify new emails into their respective

categories. These classifiers do not intend to identify word semantic meanings that are of

greater importance in classification (Hajek et al., 2020). Therefore, they do not classify emails

with various formatting styles in certain situations.

Another option is to employ a recurrent neural network (RNN) variant, such as the long short-

termmemory (LSTM)model with pre-trainedword embedding to categorize long-term sequence

dependencyinformation. Pre-trainedword embedding is a technique that usesa vast corpus to

represent a word in a high-dimensional feature vector depending on its semantic significance,

which can help the model identify emails more accurately (Moreo et al., 2021).

This study aims to use machine learning and deep learning methodologies to create a

framework for categorizing email as ham or spam. Traditional machine learning approaches

such as support vector machine (SVM), random forest (RF), logistic regression (LR), Gaussian

Naive Bayes (GNB), multinomial Naive Bayes (MNB) and AdaBoost, as well as the LSTM and

convolutional neural network (CNN) deep learning models with and without pre-trained

embedding, are utilized to find the optimum model for classification. The following are some

of the significant contributions to enhancing email classification performance:

The purpose of this study is to explore how applying machine learning and deep

learning models to a dataset with and without pre-trained embedding can improve

classification performance.

Global vectors (GloVe) and Bidirectional Encoder Representations Transformers

(BERT) pre-trained word embedding is used to find word relationships, which aids in

the classification of emails into appropriate categories.

Evaluate the performance of machine learning and deep learning models with and

without GloVe and BERT pre-trained embedding by computing the confusion matrix,

accuracy, precision, recall, F1-score and execution time with 10-fold cross-validation.

The results show that the CNN model with GloVe embedding gives slightly better

accuracy than the model with BERT embedding and traditional machine learning

algorithms to classify email as ham or spam.

The remaining sections of the paper are organized as follows: Section 2 summarizes relevant

studies on email classification using machine learning and deep learning models. Section 3

focuses on the background of the experimental setup. In section 4, the experimental results

are discussed, followed by a conclusion on model performance in section 5.

2. Literature review

This section summarizes notable studies on email classification using machine learning and

deep learning models. In business, email is considered as official communication. For consumers,

it is an account for sending greetings, a mediator between social media accounts, online

shopping accounts and maintaining easy access to their valuable documents. In email

DTA

56,4

484

To continue reading

Request your trial

Subscribers can access the reported version of this case.

You can sign up for a trial and make the most of our service including these benefits.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Subscribers are able to see a list of all the cited cases and legislation of a document.

You can sign up for a trial and make the most of our service including these benefits.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Subscribers are able to see a list of all the documents that have cited the case.

You can sign up for a trial and make the most of our service including these benefits.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Subscribers are able to see the revised versions of legislation with amendments.

You can sign up for a trial and make the most of our service including these benefits.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Subscribers are able to see any amendments made to the case.

You can sign up for a trial and make the most of our service including these benefits.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Subscribers are able to see a visualisation of a case and its relationships to other cases. An alternative to lists of cases, the Precedent Map makes it easier to establish which ones may be of most relevance to your research and prioritise further reading. You also get a useful overview of how the case was received.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Subscribers are able to see the list of results connected to your document through the topics and citations Vincent found.

You can sign up for a trial and make the most of our service including these benefits.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Exploring the effectiveness of word embedding based deep learning model for improving email classification

You can sign up for a trial and make the most of our service including these benefits.

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users

You can sign up for a trial and make the most of our service including these benefits.

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users

You can sign up for a trial and make the most of our service including these benefits.

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users

You can sign up for a trial and make the most of our service including these benefits.

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users

You can sign up for a trial and make the most of our service including these benefits.

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users

You can sign up for a trial and make the most of our service including these benefits.

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users