A hybrid approach for phishing web site detection

Pages927-944
Published date07 November 2016
Date07 November 2016
DOIhttps://doi.org/10.1108/EL-07-2015-0132
AuthorMehdi Dadkhah,Shahaboddin Shamshirband,Ainuddin Wahid Abdul Wahab
Subject MatterInformation & knowledge management,Information & communications technology,Internet
A hybrid approach for phishing
web site detection
Mehdi Dadkhah
Department of Computer and Information Technology,
Foulad Institute of Technology, Isfahan, Iran, and
Shahaboddin Shamshirband and Ainuddin Wahid Abdul Wahab
Department of Computer System and Information Technology,
University of Malaya, Kuala Lumpur, Malaysia
Abstract
Purpose – This paper aims to present a hybrid approach based on classication algorithms that was
capable of identifying different types of phishing pages. In this approach, after eliminating features that
do not play an important role in identifying phishing attacks and also after adding the technique of
searching page title in the search engine, the capability of identifying journal phishing and phishing
pages embedded in legal sites was added to the presented approach in this paper.
Design/methodology/approach – The hybrid approach of this paper for identifying phishing web
sites is presented. This approach consists of four basic sections. The action of identifying phishing web
sites and journal phishing attacks is performed via selecting two classication algorithms separately.
To identify phishing attacks embedded in legal web sites also the method of page title searching is used
and then the result is returned. To facilitate identifying phishing pages the black list approach is used
along with the proposed approach so that the operation of identifying phishing web sites can be
performed more accurately, and, nally, by using a decision table, it is judged that the intended web site
is phishing or legal.
Findings – In this paper, a hybrid approach based on classication algorithms to identify phishing
web sites is presented that has the ability to identify a new type of phishing attack known as journal
phishing. The presented approach considers the most used features and adds new features to identify
these attacks and to eliminate unused features in the identifying process of these attacks, does not have
the problems of previous techniques and can identify journal phishing too.
Originality/value – The major advantage of this technique was considering all of the possible and
effective features in identifying phishing attacks and eliminating unused features of previous
techniques; also, this technique in comparison with other similar techniques has the ability of
identifying journal phishing attacks and phishing pages embedded in legal sites.
Keywords E-commerce, Phishing, Journal phishing, Social engineering
Paper type Research paper
1. Introduction
Phishing attacks were described for the rst time in detail in 1987, although the term
itself was rst used in 1996 (Martino and Perramon, 2010). Phishing is short for
“password harvesting shing” (hunting for user passwords using bait), with the “ph”
replacing the initial “f” on shing to induce a tempting concept (McRae and Vaughn,
2007). Phishing attacks are said to be an effort to acquire people’s sensitive information,
such as usernames, passwords and credit card information via social engineering
techniques (Butler, 2007). In this type of attack, phishers (attackers or those who carry
The current issue and full text archive of this journal is available on Emerald Insight at:
www.emeraldinsight.com/0264-0473.htm
Phishing web
site detection
927
TheElectronic Library
Vol.34 No. 6, 2016
pp.927-944
©Emerald Group Publishing Limited
0264-0473
DOI 10.1108/EL-07-2015-0132
out phishing attacks) start by designing sites similar to the intended sites. In other
words, the main goal of a phishing attack is to establish a fake communication, which is
usually initiated through an e-mail containing a fake URL derived from a bank web site
or government entity.
2. Previous studies
Numerous studies and efforts have been made to introduce different types of phishing
attacks and to dene procedures to confront them. Phishing attacks generally include
deception phishing attacks (Ali and Rajamani, 2012), phishing based on malicious
software (Li and Schmitz, 2009), web Trojans (Johnson, 2009), pharming (Dadkhah and
Jazi, 2014), phishing injection (Alkhateeb et al., 2012), man-in-the-middle attacks
(Callegati et al., 2009), phishing using fake programs (Shi and Saleem, 2012), domain
hijacking (Chandavale and Sapkal, 2010), spear phishing (Hong, 2012) and phishing
through changing user system settings (Dadkhah and Jazi, 2014). To handle these sorts
of attacks, a range of techniques have been developed such as sign-in seal (Agarwal
et al., 2009), expert systems based on web page features to detect phishing web sites
(Aburrous et al., 2010a), phishing hyperlink detection using genetic algorithms
(Shreeram et al., 2010), phishing attack detection by hyperlink classication (Chen and
Guo, 2006), attribute-based prevention of phishing attacks (Atighetchi and Pal, 2009),
images for content-based phishing analysis (Dunlop et al., 2010), preventive
anti-phishing using code word (Mishra and Jain, 2012), phishing page identication
according to the degree of similarity between web page content and domain
classication (Sanglerdsinlapachai and Rungsawang, 2010), relations for phishing
detection (Liu et al., 2010), phishing page identication through comparing the degree of
URL string deference against a white list (Reddy et al., 2011), phishing URL detection
using ranking algorithms (Khonji et al., 2011) and data mining algorithms (Larose, 2014;
Ramya et al., 2011;Aburrous et al., 2010b) to detect phishing web sites and fuzzy-based
classiers providing strong protection and making users aware of the growing phishing
attack problem and nding a better way for detection (Kaur and Sharma, 2015). The
mentioned methods and techniques have 27 features (Table I) to begin identifying
phishing web sites. However, the methods are inefcient against journal phishing
attacks (Dadkhah et al., 2015b) because their 27 key features are mostly achieved from
sites related to online commercial exchanges. Dadkhah et al. (2015b) considered journal
phishing attacks and ways of identifying them. However, the focus of their article was
on identifying journal phishing attacks, and methods of identifying other phishing
attacks were not presented.
There are techniques similar to the approach proposed in this article, but they have
weak points that will be explained. In confronting phishing attacks using expert system
deployment [14], a list containing 27 features (Table I) is introduced to identify phishing
pages. The features are then placed into six categories, according to which the expert
system is trained with a set of rules. However, this technique contains a vast set of rules,
and it does not determine how the measures, specically those related to human factors,
are extracted from web sites (Mohammad et al., 2014a;Larose, 2014;Ramya et al., 2011;
Aburrous et al., 2010b) used data mining techniques to nd patterns related to phishing
page detection and considered 27 features corresponding to phishing pages to train the
classier. There are numerous similar articles and research works in this eld, in which
the proposed techniques are not able to identify journal phishing. This is because
EL
34,6
928

To continue reading

Request your trial

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT