Detection of phishing websites using a novel twofold ensemble model

Document

Cited in

DOI	https://doi.org/10.1108/JSIT-09-2017-0074
Pages	321-357
Published date	13 August 2018
Date	13 August 2018
Author	Kalyan Nagaraj,Biplab Bhattacharjee,Amulyashree Sridhar,Sharvani GS
Subject Matter	Information & knowledge management,Information systems,Information & communications technology

Detection of phishing websites

using a novel twofold

ensemble model

Kalyan Nagaraj

Department of Computer Science and Engineering,

RV College of Engineering, Bangalore, India

Biplab Bhattacharjee

School of Management Studies, National Institute of Technology, Calicut, India, and

Amulyashree Sridhar and Sharvani GS

Department of Computer Science and Engineering,

RV College of Engineering, Bangalore, India

Abstract

Purpose –Phishing is one of the major threats affecting businesses worldwide in current times.

Organizationsand customers face the hazards arising out of phishingattacks because of anonymous access to

vulnerable details. Suchattacks often result in substantial ﬁnancial losses. Thus, there is a need for effective

intrusion detection techniques to identify and possibly nullify the effects of phishing. Classifying phishing

and non-phishing web content is a criticaltask in information security protocols,and full-proof mechanisms

have yet to be implemented in practice.The purpose of the current study is to present an ensemble machine

learningmodel for classifying phishing websites.

Design/methodology/approach –A publicly available data set comprising 10,068 instances of

phishing andlegitimate websites was used to build the classiﬁermodel. Feature extraction was performed by

deploying a group of methods, and relevant featuresextracted were used for building the model. A twofold

ensemble learner was developed by integrating results from random forest (RF) classiﬁer, fed into a

feedforward neural network (NN). Performance of the ensemble classiﬁer was validated using k-fold cross-

validation. The twofold ensemble learner was implemented as a user-friendly, interactive decision support

system for classifyingwebsites as phishing or legitimate ones.

Findings –Experimental simulations were performed to access and compare the performance of the

ensemble classiﬁers. The statistical tests estimated that RF_NN model gave superior performance with an

accuracyof 93.41 per cent and minimal mean squared error of 0.000026.

Research limitations/implications –The research data set used in this study is publically

available and easy to analyze. Comparative analysis with other real-time data sets of recent origin

must be performed to ensure generalization of the model against various security breaches. Different

variants of phishing threats must be detected rather than focusing particularly toward phishing

website detection.

Originality/value –The twofold ensemble model is not appliedfor classiﬁcation of phishing websites in

any previousstudies as per the knowledge of authors.

Keywords Machine learning, Ensemble learner, Intelligent systems, Phishing website

Paper type Research paper

1. Introduction

Exponential expansion of datain digital media over the years has resulted in corresponding

growth of e-commerce transaction volumes. Internet has provided a digital platform to

Detection of

phishing

websites

321

Received7 September 2017

Revised16 January 2018

16May 2018

14June 2018

Accepted25 July 2018

Journalof Systems and

InformationTechnology

Vol.20 No. 3, 2018

pp. 321-357

1328-7265

DOI 10.1108/JSIT-09-2017-0074

The current issue and full text archive of this journal is available on Emerald Insight at:

www.emeraldinsight.com/1328-7265.htm

encourage rapid communication between suppliers and end-users for information

dissemination of their products(Basu and Muylle, 2003). This rise of social media platforms

by consumers has led to usage of such mediums by marketersfor product promotions. Such

modernizations have gathered immense popularity over the last decade and continue to

expand in the coming years (Ho et al.,2007). Recent progress with smartphone devices,

tablets and notebooks have further lead to information eruption ranging from terabytes to

petabytes.

However, it is immensely important to observe the other side of this data expansion,

which is the inherent threat to sustainingrefuge of such content. This escalation of privacy

apprehensions is predominantlybecause of numerous threats which gain counterfeit access

resulting in loss of sensitive information. These security outbreaks can be commonly

classiﬁed as active and passive forms of attacks. In active attacks, the tenable systems are

circumvented to gain access to legitimate facts, whereas passive attacks set up sniffer

devices to detect secure information (Summerville et al.,2005). Phishing is an instance of

passive online fraud, deﬁned as a deceitful deed of camouﬂaging conﬁdence to acquire

protected credentials by retrieving emails, passwords, usernames and credit card

transactions (Caswell and Orebaugh, 2005). Different ﬂavors of phishing attacks are

witnessed, among which website attacks are the predominant ones. It is noteworthy to

pinpoint the method adopted by phishers for accessing sensitive information. Initially, an

attacker creates a visually identical replica of a legitimate website (this is referred to as

phishing website), and subsequently, such sites seek unscrupulous access to customer’s

private details by false recognition for the acquisition of monetary beneﬁts (Yu et al.,

2008).

At the moment, there is a prototype swing in the direction of social media

outbreaks, recurrently detected on Twitter and Facebook platforms (Chandrasekaran

et al., 2006;Grier et al.,2010). Such phishing threats are consistently observed in

several domains, as they intend to exploit the ignorance of humans leading to

sanctuary breaches. Some of the recent statistical reports have stressed upon the

increasing trends and the intensity of these security threats. According to the internet

security report released by Symantec, a leading cyber security organization, phishing

emails have largely contributed to business email compromise (BEC) threats, resulting

in a forfeiture of $3 bn (Internet Security Threat Report, 2017). Likewise, latest ﬁgures

from Anti Phishing Working Group (APWG) (Phishing Activity Trends Report 1

Quarter, 2010) released for the last quarter of the year 2016 has provided details of

about 20 million new malware. This report showed an increased volume of phishing

websites by a whopping 250 per cent as compared to the last quarter in 2015

(Phishing Activity Trends Report, 2017). Repercussions from these reports indicate a

likelihood of an exponential increase in phishing attacks in future, causing tremendous

obliteration to the security of mankind.

With such distressing upsurge of phishing attacks, an eternal necessity arises to

formulate enriched systematic explications for safeguarding consumers from diverse

malevolent asylum extortions. In retort to the propagation of phishing attacks, numerous

phishing detection practices are being implemented by different agencies to protect

information systemsused by consumers and establishments.Several authors have proposed

various methods of phishingdetection in past studies (Zhang et al., 2007;Garera et al.,2007;

Dong et al.,2008;Medvet et al.,2008;Yuet al.,2009;Afrozand Greenstadt, 2011;Singh et al.,

2011;Zhuang et al.,2012;Marchal et al.,2014;Rao and Ali, 2015;Ahmed and Abdullah,

2016; Y.A.Abutairand Belghith, 2017). However, the presence of abundant varieties of cyber

threats and immense competition among numerous security vendors may not mitigate all

JSIT

20,3

322

forms of attacks completely.In the context of phishing attacks,majority of the anti-phishing

tools fail to target speciﬁcﬂavors of these outbreaks, as they tackle them as a massive bulk

without any knowledgebase of the previous and future explosionof attacks (Dhamija et al.,

2006). Alleviation of phishing attacks certainly requires effective analytical strategies and

techniques supported by adept manual intervention. Such premeditated prediction of

phishing threats enhances security and safeguards consumer details, which also helps in

minimizing hazards of online monetary transactions at domestic and global levels (Khonji

et al.,2013).

It is of ample importance to have a clear distinction between legitimate and phishing

websites so as to have strategies to mitigate security threats to both customers and

organizations. These threats are initiated from the time when a solitary ﬁssure of

concealment is accompanied by numerous intimidations leading to unauthorized ﬁnancial

transactions and organizational losses. As a consequence, mathematical models must be

adopted by business website domains to identifyconsumers who are at the verge of privacy

threat (Ryan, 2001;Wieland et al.,2008;Liu and Terzi, 2009). Accordingly, in present times,

business analysts are tryingto adopt different strategies for comprehending the connotation

of phishing attacks on cyber securitymodules (Buczak and Guven, 2016).

From the user’s perspective, it is imperative for the webmasters to elucidate the

manifestation of phishing websites precisely within a pre-deﬁned timeframe for curtailing

phish extortions. With the current explosion of information among web domains and

increasing demand for online businesses, business analytics and intelligence (BAI)

applications havepaved the way for value-added recognition and deterrenceof cyber threats

(Zuech et al.,2015). In this aspect, machine learning and data mining algorithms are viewed

as supporting tools for business analytics to uncover phishing outbreaks by studying the

historical behavioral patterns of the websites (Qabajeh and Thabtah, 2014;Smadi et al.,

2015;Jain and Gupta, 2017).

It is essential to understand the signiﬁcance of predictor variables which can feasibly

indicate the categorization of the websites into phishing and non-phishing types. Most

studies have considered predictor attributes based on objects from World Wide Web

(WWW), Uniform Resource Locator (URL) based features along with the third-party

features describing a webpage (Atighetchi and Pal, 2009;Wang et al., 2009;Alkhozae and

Batarﬁ,2011). Feature selection techniques are among the best-known machine learning

methodologies for extracting relevant attributes from data sets (Guyon and Elisseeff, 2003;

van der Maaten et al.,2009). Based on the features extracted, several mathematical models

have been developedto classify the phishing websites.

The foremost objective of detecting phishing attacks using business analytics

applications is to classifythe websites into two categories of phishing and non-phishing and

to provide better strategiesfor prevention of phishing attacks (Whittaker et al., 2010). In this

perspective, several data mining algorithms such as Logistic Regression (LR), Naïve Bayes

(NB), Support Vector Machines (SVM), DecisionTrees (DT), Association Rules mining (AR)

and Neural Networks (NN)are adopted for building predictive models intended fordetecting

phishing threats. Manystudies have presented the comparative analysis of these algorithms

on different sets of phishing data (Bratko et al., 2006;Basnet et al.,2008;Lakshmi and

Vijaya, 2012;Panda et al.,2012;Chu et al.,2013;James et al., 2013;Soska and Christin, 2014;

Singh et al.,2015;Jeeva and Rajsingh, 2016;Ramesh, Gupta and Gamya, 2017). Other

published works have focused on individual algorithms for identifying phishing threats

(Huang et al., 2012;Feroz and Mengel, 2014;Akinyelu and Adewumi, 2014;Abdelhamid,

2015;Li et al.,2016). Also, several researchers have focused on developing

ensemble models by combining discretealgorithms (Patil and Sherekar, 2013;Montazer and

Detection of

phishing

websites

323

To continue reading

Request your trial

Subscribers can access the reported version of this case.

You can sign up for a trial and make the most of our service including these benefits.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Subscribers are able to see a list of all the cited cases and legislation of a document.

You can sign up for a trial and make the most of our service including these benefits.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Subscribers are able to see a list of all the documents that have cited the case.

You can sign up for a trial and make the most of our service including these benefits.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Subscribers are able to see the revised versions of legislation with amendments.

You can sign up for a trial and make the most of our service including these benefits.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Subscribers are able to see any amendments made to the case.

You can sign up for a trial and make the most of our service including these benefits.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Subscribers are able to see a visualisation of a case and its relationships to other cases. An alternative to lists of cases, the Precedent Map makes it easier to establish which ones may be of most relevance to your research and prioritise further reading. You also get a useful overview of how the case was received.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Subscribers are able to see the list of results connected to your document through the topics and citations Vincent found.

You can sign up for a trial and make the most of our service including these benefits.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Detection of phishing websites using a novel twofold ensemble model

You can sign up for a trial and make the most of our service including these benefits.

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users

You can sign up for a trial and make the most of our service including these benefits.

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users

You can sign up for a trial and make the most of our service including these benefits.

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users

You can sign up for a trial and make the most of our service including these benefits.

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users

You can sign up for a trial and make the most of our service including these benefits.

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users

You can sign up for a trial and make the most of our service including these benefits.

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users