Data quality issues leading to sub optimal machine learning for money laundering models

DOIhttps://doi.org/10.1108/JMLC-05-2021-0049
Published date28 July 2021
Date28 July 2021
Pages551-555
AuthorAbhishek Gupta,Dwijendra Nath Dwivedi,Jigar Shah,Ashish Jain
Data quality issues leading to sub
optimal machine learning for
money laundering models
Abhishek Gupta
Department of Management, Bharathidasan Institute of Management,
Tiruchirappalli, India
Dwijendra Nath Dwivedi
Department of Economics and Finance, UEK, Krakow,
Poland and Department of Development, IGIDR, Mumbai, India
Jigar Shah
Department of Management, Narsee Monjee Institute of Management and Higher
Studies, Mumbai, India, and
Ashish Jain
Indian Institute of Management Lucknow, Lucknow, India
Abstract
Purpose Good quality input data is critical to developing a robust machine learning model for
identifying possible money laundering transactions. McKinsey, during one of the conferences of ACAMS,
attributed data quality as one of the reasons for struggling articial intelligence use cases in compliance to
data. There were often use concerns raised on data quality of predictors such as wrong transaction codes,
industry classication, etc. However, there has not been much discussion on the most critical variable of
machinelearning,thedenition of an event, i.e. the date on which the suspicious activity reports (SAR) is
led.
Design/methodology/approach The team analyzed the transaction behavior of four major banks
spread across Asia and Europe. Based on the ndings, the team created a synthetic database comprising 2,000
SAR customers mimicking the time of investigation and case closure. In this paper, the authors focused on one
very specicareaofdataquality,thedenition of an event, i.e. the SAR/suspicious transaction report.
Findings The analysis of few of the banks in Asiaand Europe suggests that this itself can improve the
effectivenessof model and reduce the prediction span, i.e. the time lag between money laundering transaction
done and predictionof money laundering as an alert for investigation
Research limitations/implications The analysis was done with existing experience of all situations
where the time duration between alertand case closure is high (anywhere between 15 days till 10 months). Team
could not quantify the impact of this nding due to lack of such actual case observed so far.
Originality/value The key nding from paper suggests that the money launderers typically either
increase their level of activity or reducetheir activity in the recent quarter. This is not true in terms of real
behavior. They typically showa spike in activity through various means during money laundering. Thisin
turn impacts the quality of insightsthat the model should be trained on. The authors believe that once the
© Abhishek Gupta, Dwijendra Nath Dwivedi, Jigar Shah and Ashish Jain. Published by Emerald
Publishing Limited. This article is published under the Creative Commons Attribution (CC BY 4.0) licence.
Anyone may reproduce, distribute, translate and create derivative works of this article (for bothc ommercial
and non-commercial purposes), subject to full attribution to the original publication and authors. The full
terms of this licence may be seen at http://creativecommons.org/licences/by/4.0/legalcode
Data quality
issues
551
Journalof Money Laundering
Control
Vol.25 No. 3, 2022
pp. 551-555
EmeraldPublishing Limited
1368-5201
DOI 10.1108/JMLC-05-2021-0049
The current issue and full text archive of this journal is available on Emerald Insight at:
https://www.emerald.com/insight/1368-5201.htm

To continue reading

Request your trial

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT