A boosting-based transfer learning method to address absolute-rarity in skin lesion datasets and prevent weight-drift for melanoma detection

DOIhttps://doi.org/10.1108/DTA-10-2021-0296
Published date20 June 2022
Date20 June 2022
Pages1-17
Subject MatterLibrary & information science,Librarianship/library management,Library technology,Information behaviour & retrieval,Metadata,Information & knowledge management,Information & communications technology,Internet
AuthorLokesh Singh,Rekh Ram Janghel,Satya Prakash Sahu
A boosting-based transfer learning
method to address absolute-rarity
in skin lesion datasets and prevent
weight-drift for
melanoma detection
Lokesh Singh, Rekh Ram Janghel and Satya Prakash Sahu
Department of Information Technology, National Institute of Technology,
Raipur, India
Abstract
Purpose Automated skin lesion analysis plays a vital role in early detection. Having relatively small-sized
imbalanced skin lesion datasets impedes learning and dominates research in automated skin lesion analysis.
The unavailability of adequate data poses difficulty in developing classification methods due to the skewed
class distribution.
Design/methodology/approach Boosting-based transferlearning (TL) paradigms like TransferAdaBoost
algorithm cancompensate for such a lack of samplesby taking advantage of auxiliarydata. However, in such
methods, beneficial source instances representing the target have a fast and stochastic weight convergence,
which results in weight-driftthat negates transfer. In this paper, a framework is designed utilizing the
Rare-Transfer(RT),a boosting-based TL algorithm,that preventsweight-driftandsimultaneously addresses
absolute-rarity in skin lesion datasets. RT prevents the weights of source samples from quick convergence.
It addresses absolute-rarity using an instance transfer approach incorporating the best-fit set of auxiliary
examples, which improves balanced error minimization. It compensates for class unbalance and scarcity of
training samplesin absolute-rarity simultaneously for inducing balancederror optimization.
Findings Promising results are obtained utilizing the RT compared with state-of-the-art techniques on
absolute-rare skin lesion datasets with an accuracy of 92.5%. Wilcoxon signed-rank test examines significant
differences amid the proposed RT algorithm and conventional algorithms used in the experiment.
Originality/value Experimentation is performed on absolute-rare four skin lesion datasets, and the
effectiveness of RT is assessed based on accuracy, sensitivity, specificity and area under curve. The
performance is compared with an existing ensemble and boosting-based TL methods.
Keywords Skin lesion, Weight-drift, Rare-Transfer, Absolute-rarity, Weight convergence, Source, Target
Paper type Research paper
1. Introduction
Withthe rising of artificial intelligence,deep and transferlearning are successivelyemployed in
several fields including disease recognition in plants (Li and Chao, 2021), agricultural field
(Li and Yang, 2021;Yanget al., 2022), medical like cancer detection, etc. In the field of machine
learning, the training sets size is a factor that has a very significant effect on classification
performance.The rare category (minorityor outnumberedclass) plays a vital role regardlessof
theirutmost scarcity. In skinlesion detection (Dhivyaa et al., 2020),most skin lesions are benign,
whileonly a few are malignant (He, 2010;Singhet al., 2021a). In situations where false negatives
Absolute-rarity
in skin lesion
datasets
1
Ethics approval: This article does not contain any studies with human participants or animals performed
by any of the authors.
Informed consent: Informed consent was obtained from all individual participants included in the study.
Funding: No funding is provided for experimentation.
Conflict of interest: The authors declare that there is no conflict of interest regarding the publication of
this paper.
The current issue and full text archive of this journal is available on Emerald Insight at:
https://www.emerald.com/insight/2514-9288.htm
Received 23 October 2021
Revised 9 February 2022
29 March 2022
Accepted 18 April 2022
Data Technologies and
Applications
Vol. 57 No. 1, 2023
pp. 1-17
© Emerald Publishing Limited
2514-9288
DOI 10.1108/DTA-10-2021-0296
(FN) arecomparatively higherthan the false positives(FP), the learning models prediction gets
biased towardsthe majority class, which might have aversiveconsequences. For instance,out
of patients havingsuspicious lesions, very few are at the risk of having malignant melanoma
(rare class) while a majority of them are benign. Here, FN indicates a patient suffering from
malignant melanoma misclassified as benign, which is a severe issue. Contrary to FN, FP
indicates a non-cancerous patientclassified as malignant, whichis relatively not severe. Thus,
the problemof high-class imbalancehas a huge impact on predictivelearning (Leevy etal., 2018;
Singh et al., 2021b).Therefore, the prediction of rare categories or rare samples in skin lesion
datasetsmay protect from malignantmelanoma and help save lives.Some skin lesion datasets
are naturallyimbalanced and small in size as well; hence,its unbalance is known as Absolute
Rarity(Al-Stouhi and Reddy, 2016), and theoutnumbered (minority) class in a less-sized and
unbalanceddataset is termed as RareClass(Weiss, 2004).The obstacle towards learningwith
Absolute Rarityisthe less-sized dataset with high imbalance, which impedes learning.
When training instances are insufficient for generalizing instances that are missing in the
training data, the possibility of using a learning modelwillincreasetherisk of overfitting the
training set. There are few solutions to resolve the problem of insufficient training instances in
the rare dataset. One such solution is the data enhancement/data augmentation, but it does have
an explicit regularization effect, exploiting data augmentation may lead to the model not learning
enough which results in poor prediction. Thus, when the available data is rare with unequal class
distribution, transfer learning (TL) approaches (Jasil and Ulagamuthalvi, 2021) compensate for
data scarcity utilizing auxiliary data (Al-Stouhi, 2013). Transfer AdaBoost (TrAdaBoost) (Dai
et al., 2007) and Transfer Constituent Support Vector Machine (TrCSVM) (Singh et al.,2020),
boosting-based TL approaches, apply ensemble approaches over the instances of both ends
(source and target) alongside an update procedure, incorporating those source examples that are
useful in the target examples classification. These approaches conduct such mapping by
assigning higher weights to source samples, which improves the targets training and decreases
the weights of those instances that induce negative transfer. However, the weight of incorrectly
classified source instances converges consistently and thus could not be used in the output of the
final classification. Due to the persistent convergence of source weights, there is an inverse
increase in the targets weight, resulting in weight-drift, which negates transfer.
Our framework overcomes such limitations by utilizing Rare-Transfer(RT), a boosting-
based TL algorithm, for rare skin lesion dataset classification while preventing the Weight-Drift
phenomenon. The designed framework effectually integrates the strength of two boosting
methods, namely Adaptive Boosting (AdaBoost) (Freund and Schapire, 1997) (for updating target
instances weights) and the weighted majority algorithm (WMA) (Littlestone and Warmuth, 1994)
(for balancing transfer). The algorithm first applies a Correction Factor(CF) on the weights of
the source to prevent weight-driftandthenequalizestheweightupdating procedure of target
samples. The algorithm enhances the classification by utilizing the auxiliary data from one
domain (source) to another domain (target) using TL. Simultaneously, it improves the balanced
classification by allocating more weights to the auxiliary instances subset, improving and
balancing the classifier. Following are the key contributions of the designed framework:
(1) A framework is designed based on a boosting-based TL algorithm, namely RT, to handle
absolute-rarityinskinlesiondatasetswhile preventing the Weight-Drift phenomenon.
(2) The significance of predicting rare categories in skin lesion datasets is highlighted,
and the limitations that prohibit learning are analyzed.
(3) Theoretically analyzes the RT algorithm.
(4) The RT algorithms performance is evaluated on four benchmarks, absolute-rare,
publicly available skin lesion datasets to test its generalizing behaviour.
DTA
57,1
2

To continue reading

Request your trial

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT