Feature distillation and accumulated selection for automated fraudulent publisher classification from user click data of online advertising

Document

Cited in

DOI	https://doi.org/10.1108/DTA-09-2021-0233
Published date	06 January 2022
Date	06 January 2022
Pages	602-625
Subject Matter	Library & information science,Librarianship/library management,Library technology,Information behaviour & retrieval,Metadata,Information & knowledge management,Information & communications technology,Internet
Author	Deepti Sisodia,Dilip Singh Sisodia

Feature distillation and

accumulated selection for

automated fraudulent publisher

classification from user click data

of online advertising

Deepti Sisodia and Dilip Singh Sisodia

Computer Science and Engineering, National Institute of Technology Raipur,

Raipur, India

Abstract

Purpose –The problem of choosing the utmost useful features from hundreds of features from time-series

user click data arises in online advertising toward fraudulent publisher’s classification. Selecting feature

subsets is a key issue in such classification tasks. Practically, the use of filter approaches is common; however,

they neglect the correlations amid features. Conversely, wrapper approaches could not be applied due to their

complexities. Moreover, in particular, existing feature selection methods could not handle such data, which is

one of the major causes of instability of feature selection.

Design/methodology/approach –Toovercome such issues, a majorityvoting-based hybrid featureselection

method, namely feature distillation and accumulated selection (FDAS), is proposed to investigate the optimal

subset of relevant features for analyzing the publisher’s fraudulent conduct. FDAS works in two phases:

(1) featuredistillation, where significantfeatures from standardfilter and wrapper feature selectionmethods are

obtained usingmajority voting; (2) accumulatedselection, where we enumeratedan accumulated evaluation of

relevant featuresubset to search for an optimal feature subsetusing effective machine learning(ML) models.

Findings –Empirical results prove enhanced classification performance with proposed features in average

precision, recall, f1-score and AUC in publisher identification and classification.

Originality/value –The FDAS is evaluated on FDMA2012 user-click data and nine other benchmark

datasets to gauge its generalizing characteristics, first, considering original features, second, with relevant

feature subsets selected by feature selection (FS) methods, third, with optimal feature subset obtained by the

proposed approach. ANOVA significance test is conducted to demonstrate significant differences between

independent features.

Keywords Fraudulent publisher, FDAS, Feature selection, Feature distillation, Accumulated selection,

Majority voting

Paper type Research paper

1. Introduction

With the development of state-of-the-art techniques and global communication, fraud is

growing drastically (Sisodia et al., 2018). In the pay-per-click (PPC) online advertising

model, the advertising commissioner acted as a central coordinator among advertisers and

publishers. The advertiser provides the advertisements to the advertising commissioner

based on the planned budget and pays the commission for every generated click (Berrar,

2012,2016). The publisher communicates with the advertising commissioner to display

ads on their web pages to receive user clicks and commission proportionate to the

DTA

56,4

602

Ethical approval: This article does not contain any studies with human participants or animals

performed by any authors.

Compliance with ethical standards

Funding: No funding is provided for experimentation.

Conflict of interest: All authors declare that they have no conflict of interest.

The current issue and full text archive of this journal is available on Emerald Insight at:

https://www.emerald.com/insight/2514-9288.htm

Received 3 September 2021

Revised 18 November 2021

Accepted 17 December 2021

Data Technologies and

Applications

Vol. 56 No. 4, 2022

pp. 602-625

2514-9288

DOI 10.1108/DTA-09-2021-0233

generated clicks (Xu et al., 2014). However, the clicks generated may be from genuine

publishers, deceptive software agents, or other illegitimate means. Publishers’

monetization of illegitimate clicks is known as click fraud, and such publishers are

termed fraudulent publishers. Therefore, a click fraud detection (CFD) model is required to

prevent click fraud.

A CFD model is considered as a predictive model identifying the click-log behavior of the

publisher. The publisher’s behavior is assessed based on his generated clicks (Perera et al.,

2013). The model predicts the legitimacy of the current click-log behavior of the publisher

utilizing click-log generated by the publisher in the past. Feature engineering in detecting

click fraud is vital for constructing feature variables that appropriately summarize and

represent the publisher’s conduct from raw click-log records (Cadenas et al., 2013). However,

filtering a large number of features and transforming these features into significant ones is a

crucial task of the CFD model as irrelevant features in training data may generate erroneous

results (Li et al., 2017). The major task in this concern is finding a feature subset that can

identify the anomalous behavior termed “feature selection”(Hoque et al., 2014). Feature

selection is the process of electing relevant feature subsets to use in model construction (Liu

et al., 2017). An appropriate selection of features may improve generalization, alleviate

training time, enhance the model’s interpretability, enhance accuracy and alleviate the

prediction time.

Though several conventional FS methods exist, they result in high computational

complexity as the feature space is v ast and infinite for nnumber of features that make the

search difficult with an increase in the number of features (Kohavi and John, 1997).

Therefore, this work proposes a hybrid feature selection method, namely features

distillation and accumulated selection (FDAS), which differs from conventional FS toward

selecting the best subset of features, as shown in Figure 1. The traditional FS method

selects the best subset of features utilizing statistical methods while focusing on individual

features to identify their relative importance. However, a feature might not be significant on

its own but might be an effective influencer when accumulated with other features. In

comparison, the proposed FDAS feature selection approach finds the best subset of

features in two phases. The first phase of Feature Distillation identifies the highly

correlated features, thereby obtaining an optimal subset using Majority Voting, which

filters highly voted commonly ranked features. In the second phase of Accumulated

Selection, FDAS overcomes the limitation of conventional FS methods by an iterative

accumulation of features. The proposed hybridFSapproachevaluatesthesignificanceof

features by accumulating a single feature to the combination of features by training a model

over it. It elects the combination of features results in the model performing superior as per

the performance metric.

The most optimal features determined by FDAS were used to train the standard

classification models for fraudulent publisher classification. The proposed feature selection

aims threefold: (1) enhancing the predictive performance of models, (2) making the models

cost-effective and (3) giving a preferable understanding of the underlying procedure of

generating data. This study covers the major points that are summarized as follows:

(1) Proposed a majority voting-based hybrid feature selection method, namely feature

distillation and accumulated selection (FDAS), to find an optimal subset of features

for fraudulent publisher’s classification.

(2) Majority voting is used to obtain a highly voted commonly ranked relevant feature

subset utilizing eight Filter and Wrapper feature selection methods.

(3) The Feature Accumulation process assesses the significance of an optimal subset of

features toward designing a predictive model.

Feature

distillation and

accumulated

selection

603

To continue reading

Request your trial

Subscribers can access the reported version of this case.

You can sign up for a trial and make the most of our service including these benefits.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Subscribers are able to see a list of all the cited cases and legislation of a document.

You can sign up for a trial and make the most of our service including these benefits.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Subscribers are able to see a list of all the documents that have cited the case.

You can sign up for a trial and make the most of our service including these benefits.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Subscribers are able to see the revised versions of legislation with amendments.

You can sign up for a trial and make the most of our service including these benefits.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Subscribers are able to see any amendments made to the case.

You can sign up for a trial and make the most of our service including these benefits.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Subscribers are able to see a visualisation of a case and its relationships to other cases. An alternative to lists of cases, the Precedent Map makes it easier to establish which ones may be of most relevance to your research and prioritise further reading. You also get a useful overview of how the case was received.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Subscribers are able to see the list of results connected to your document through the topics and citations Vincent found.

You can sign up for a trial and make the most of our service including these benefits.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Feature distillation and accumulated selection for automated fraudulent publisher classification from user click data of online advertising

You can sign up for a trial and make the most of our service including these benefits.

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users

You can sign up for a trial and make the most of our service including these benefits.

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users

You can sign up for a trial and make the most of our service including these benefits.

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users

You can sign up for a trial and make the most of our service including these benefits.

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users

You can sign up for a trial and make the most of our service including these benefits.

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users

You can sign up for a trial and make the most of our service including these benefits.

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users