Phishing web site detection using diverse machine learning algorithms

Document

Cited in

Date	02 January 2020
Published date	02 January 2020
Pages	65-80
DOI	https://doi.org/10.1108/EL-05-2019-0118
Author	Ammara Zamir,Hikmat Ullah Khan,Tassawar Iqbal,Nazish Yousaf,Farah Aslam,Almas Anjum,Maryam Hamdani
Subject Matter	Information & knowledge management,Information & communications technology,Internet

Phishing web site detection using

diverse machine learning

algorithms

Ammara Zamir

Department of Computer Science, University of Wah, Quaid Avenue,

Wah Cantt, Pakistan and Department of Computer Science,

COMSATS University Islamabad –Wah Campus, Islamabad, Pakistan

Hikmat Ullah Khan and Tassawar Iqbal

Department of Computer Science, COMSATS University Islamabad,

Wah Campus, Islamabad, Pakistan

Nazish Yousaf

Department of Computer Science, University of Wah, Quaid Avenue,

Wah Cantt, Pakistan and Department of Computer and Software Engineering,

College of Electrical and Mechanical Engineering, Islamabad, Pakistan

Farah Aslam

Department of Computer Science, University of Wah, Quaid Avenue,

Wah Cantt, Pakistan

Almas Anjum

Department of Computer and Software Engineering,

College of Electrical and Mechanical Engineering, Islamabad, Pakistan, and

Maryam Hamdani

Department of Computer Science, University of Wah, Quaid Avenue,

Wah Cantt, Pakistan

Abstract

Purpose –This paper aims to present a framework to detect phishing websites using stacking model.

Phishing is a type of fraud to access users’credentials. The attackers access users’personal and sensitive

information for monetary purposes. Phishing affects diverse ﬁelds, such as e-commerce, online business,

banking and digitalmarketing, and is ordinarily carried out by sending spam emails and developingidentical

websites resembling the original websites. As people surf the targeted website, the phishers hijack their

personalinformation.

Design/methodology/approach –Features of phishing data set are analysed by using feature

selection techniques including information gain, gain ratio, Relief-F and recursive feature elimination

(RFE) for feature selection. Two features are proposed combining the strong est and weakest attributes.

Principal component analysis with diverse machine learning algorithms including (random forest [RF],

neural network [NN], bagging, support vector machine, Naïve Bayes and k-nearest neighbour) is applied

on proposed and remaining features. Afterwards, two stacking models: Stacking1 (RF þNN þBagging)

and Stacking2 (kNN þRF þBagging) are applied by combining highest scoring classiﬁers to improve the

classiﬁcation accuracy.

Machine

learning

algorithms

Received14 May 2019

Revised8 September 2019

24October 2019

Accepted6 November 2019

TheElectronic Library

Vol.38 No. 1, 2020

pp. 65-80

0264-0473

DOI 10.1108/EL-05-2019-0118

The current issue and full text archive of this journal is available on Emerald Insight at:

https://www.emerald.com/insight/0264-0473.htm

Findings –The proposed featuresplayed an important role in improving the accuracyof all the classiﬁers.

The results show that RFE playsan important role to remove the least important feature fromthe data set.

Furthermore, Stacking1 (RF þNN þBagging) outperformed all other classiﬁers in terms of classiﬁcation

accuracyto detect phishing website with 97.4% accuracy.

Originality/value –This research is novel in this regardthat no previous research focusses on using feed

forward NN andensemble learners for detecting phishing websites.

Keywords Classiﬁcation-based techniques, Ensemble learners, Feed forward neural network,

Phishing detection, Neural networks, Stacking models, Ensemble techniques, Feature selection,

Malicious URLs

Paper type Research paper

1. Introduction

In recent years, social networks havebecome a virtual meeting place for the general public.

Unfortunately, while connecting through social networks, people experience phishing

attacks. Phishing is a cybercrimewhich risks a user’s privacy, may execute malware attacks

and often steals their sensitive information. Phishing is carried out by using different

engineering techniques including: instant messages (Jakobsson, 2018); fraudulent emails or

mimicking an online bank, auctionor payment sites; and directing people to fake Web pages

(Rodríguez et al.,2019) that resemble a login page to a genuine site. Phishing attacks have

increased drasticallyin 2019, according to the Anti-Phishing WorkingGroup which detected

the total number of phishing websites in 2019 as 180,768 (Anti-Phishing Working Group,

APWG, 2019). Also, according to a Proofpoint survey, people who use social websites are

more exposed to potential phishing threats (www.proofpoint.com/us/security-awareness/

post/latest-phishing-ﬁrst-2019, accessed 8 May 2019).

A website phishing attackis carried out by spooﬁng legal identities, such as a legitimate

website. A malicious website succeeds at obtaining some user information and can lead a

user to additional malicious links that consequently gain access to even more of the user’s

sensitive or personal information.To achieve this goal, identical websites are created which

so closely resemble the original website thatthe duplicitous duplication cannot be detected.

Phishing attacks cause great economical, intellectual property and national security

damages (Vayanskyand Kumar, 2018).

Phishing spoils industries including e-commerce and internet banking. Several techniques

exist to save users from phishing attacks, including the heuristic approach (Babagoli et al.,

2019), the rule-based approach (Adewole et al., 2019) and a supervised machine learning (ML)

approach (Sahingoz et al., 2019). Supervised ML algorithms are widely used for classiﬁcation

(Alzu‘bi et al., 2018; Hawashin et al.,2019) and are more popular among a ll the techniques used

to detect phishing websites. Kumar and Chaudhary (2017) introduced a framework based on

machine learning for e-commerce-based mobile applications to detect malwares. This approach

detects mobile phishing and protects information leakage (Kumar and Chaudhary, 2017).

Internet banking is also effected by phishing. The rule-based approach was introduced by

Moghimi and Varjani (2016) to detect phishing in internet banking using four sets of features.

Support vector machine (SVM) is applied to classify the Web pages, and the proposed

framework achieved 99 per cent accuracy in detecting Web pages for phishing in internet

banking (Moghimi and Varjani, 2016).

This research study focusses on a supervisedML approach to detect phishing websites.

The contributions of this researchstudy are as follows:

Application of feature selection algorithms including: gain ratio (GR), information

gain (IG), Relief-F and recursive feature elimination (RFE) to analyse the importance

38,1

To continue reading

Request your trial

Subscribers can access the reported version of this case.

You can sign up for a trial and make the most of our service including these benefits.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Subscribers are able to see a list of all the cited cases and legislation of a document.

You can sign up for a trial and make the most of our service including these benefits.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Subscribers are able to see a list of all the documents that have cited the case.

You can sign up for a trial and make the most of our service including these benefits.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Subscribers are able to see the revised versions of legislation with amendments.

You can sign up for a trial and make the most of our service including these benefits.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Subscribers are able to see any amendments made to the case.

You can sign up for a trial and make the most of our service including these benefits.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Subscribers are able to see a visualisation of a case and its relationships to other cases. An alternative to lists of cases, the Precedent Map makes it easier to establish which ones may be of most relevance to your research and prioritise further reading. You also get a useful overview of how the case was received.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Subscribers are able to see the list of results connected to your document through the topics and citations Vincent found.

You can sign up for a trial and make the most of our service including these benefits.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Phishing web site detection using diverse machine learning algorithms

You can sign up for a trial and make the most of our service including these benefits.

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users

You can sign up for a trial and make the most of our service including these benefits.

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users

You can sign up for a trial and make the most of our service including these benefits.

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users

You can sign up for a trial and make the most of our service including these benefits.

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users

You can sign up for a trial and make the most of our service including these benefits.

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users

You can sign up for a trial and make the most of our service including these benefits.

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users