Data quality issues leading to sub optimal machine learning for money laundering models
DOI | https://doi.org/10.1108/JMLC-05-2021-0049 |
Published date | 28 July 2021 |
Date | 28 July 2021 |
Pages | 551-555 |
Author | Abhishek Gupta,Dwijendra Nath Dwivedi,Jigar Shah,Ashish Jain |
Data quality issues leading to sub
optimal machine learning for
money laundering models
Abhishek Gupta
Department of Management, Bharathidasan Institute of Management,
Tiruchirappalli, India
Dwijendra Nath Dwivedi
Department of Economics and Finance, UEK, Krakow,
Poland and Department of Development, IGIDR, Mumbai, India
Jigar Shah
Department of Management, Narsee Monjee Institute of Management and Higher
Studies, Mumbai, India, and
Ashish Jain
Indian Institute of Management Lucknow, Lucknow, India
Abstract
Purpose –Good quality input data is critical to developing a robust machine learning model for
identifying possible money laundering transactions. McKinsey, during one of the conferences of ACAMS,
attributed data quality as one of the reasons for struggling artificial intelligence use cases in compliance to
data. There were often use concerns raised on data quality of predictors such as wrong transaction codes,
industry classification, etc. However, there has not been much discussion on the most critical variable of
machinelearning,thedefinition of an event, i.e. the date on which the suspicious activity reports (SAR) is
filed.
Design/methodology/approach –The team analyzed the transaction behavior of four major banks
spread across Asia and Europe. Based on the findings, the team created a synthetic database comprising 2,000
SAR customers mimicking the time of investigation and case closure. In this paper, the authors focused on one
very specificareaofdataquality,thedefinition of an event, i.e. the SAR/suspicious transaction report.
Findings –The analysis of few of the banks in Asiaand Europe suggests that this itself can improve the
effectivenessof model and reduce the prediction span, i.e. the time lag between money laundering transaction
done and predictionof money laundering as an alert for investigation
Research limitations/implications –The analysis was done with existing experience of all situations
where the time duration between alertand case closure is high (anywhere between 15 days till 10 months). Team
could not quantify the impact of this finding due to lack of such actual case observed so far.
Originality/value –The key finding from paper suggests that the money launderers typically either
increase their level of activity or reducetheir activity in the recent quarter. This is not true in terms of real
behavior. They typically showa spike in activity through various means during money laundering. Thisin
turn impacts the quality of insightsthat the model should be trained on. The authors believe that once the
© Abhishek Gupta, Dwijendra Nath Dwivedi, Jigar Shah and Ashish Jain. Published by Emerald
Publishing Limited. This article is published under the Creative Commons Attribution (CC BY 4.0) licence.
Anyone may reproduce, distribute, translate and create derivative works of this article (for bothc ommercial
and non-commercial purposes), subject to full attribution to the original publication and authors. The full
terms of this licence may be seen at http://creativecommons.org/licences/by/4.0/legalcode
Data quality
issues
551
Journalof Money Laundering
Control
Vol.25 No. 3, 2022
pp. 551-555
EmeraldPublishing Limited
1368-5201
DOI 10.1108/JMLC-05-2021-0049
The current issue and full text archive of this journal is available on Emerald Insight at:
https://www.emerald.com/insight/1368-5201.htm
To continue reading
Request your trial