Identifying financial statement fraud with decision rules obtained from Modified Random Forest

Pages235-255
DOIhttps://doi.org/10.1108/DTA-11-2019-0208
Date11 May 2020
Published date11 May 2020
AuthorByungdae An,Yongmoo Suh
Subject MatterLibrary & information science,Librarianship/library management,Library technology,Information behaviour & retrieval,Metadata,Information & knowledge management,Information & communications technology,Internet
Identifying financial statement
fraud with decision rules obtained
from Modified Random Forest
Byungdae An and Yongmoo Suh
Korea University Business School, Seoul, Republic of Korea
Abstract
Purpose Financial statement fraud (FSF) committed by companies implies the current status of the
companies may not be healthy. As such, it is important to detect FSF, since such companies tend to conceal bad
information, which causes a great loss to various stakeholders. Thus, the objective of the paper is to propose a
novel approach to building a classification model to identify FSF, which shows high classificationperformance
and from which human-readable rules are extracted to explain why a company is likely to commit FSF.
Design/methodology/approach Having prepared multiple sub-datasets to cope with class imbalance
problem, we build a set of decision trees for each sub-dataset; select a subset of the set as a model for the sub-
dataset by removing the tree, each of whose performanceis less than the average accuracy of all trees in the set;
and then select one such model which shows the best accuracy among the models. We call the resultingmodel
MRF (ModifiedRandom Forest). Given a new instance,we extract rules from the MRF model to explain whether
the company corresponding to the new instance is likely to commit FSF or not.
Findings Experimental results show that MRF classifier outperformed the benchmark models. The results
also revealed that all the variables related to profit belong to the set of the most important indicators to FSF and
that two new variables related to gross profit which were unapprised in previous studies on FSF were
identified.
Originality/value This study proposed a method of building a classification model which shows the
outstanding performance and provides decision rules that can be used to explain the classification results. In
addition, a new way to resolve the class imbalance problem was suggested in this paper.
Keywords Financial statement fraud, Random forest, Decision rules, Feature importance, Machine learning,
Predictive model
Paper type Research paper
1. Introduction
Financial statement shows the overall financial status of a company. Therefore, financial
statement fraud (FSF) causes a great loss to various stakeholders, such as investors, creditors
and even the companys own employees. Companies should disclose their financial
information via various kinds of official announcements, including financial statements,
company-related news, external audit report and regulatory filings (Healy and Palepu, 2001).
However, executives could be easily led astray to provide their financial information with the
falsification or dilatoriness, because a financial problem of a company, if released, could
become a serious loss to the company, such as a stock market crash, divestment and damage
to the reputation of the company. Even experts like auditors and professional investors have
considerable difficulty in detecting it in advance, and it gives rise to much greater losses than
the other kinds of fraud, such as asset misappropriations, bribery and illegal gratuities
(Rezaee, 2005).
According to the 2018 Report to the Nations on Occupational Fraud and Abuse,
published by Association of Certified Fraud Examiners (ACFE), an estimate of the loss
due to FSF of a company amounts to about 5% of its annual revenue on average. It was
also reported FSF is one of the occupational frauds in business with the greatest median
loss per case, $800,000 (ACFE, 2018). The huge loss and bankruptcy provoked by FSF
Identifying
financial
statement
fraud
235
This research is partially supported by the Korea University Business School Research Grant.
The current issue and full text archive of this journal is available on Emerald Insight at:
https://www.emerald.com/insight/2514-9288.htm
Received 18 November 2019
Revised 25 March 2020
Accepted 11 April 2020
Data Technologies and
Applications
Vol. 54 No. 2, 2020
pp. 235-255
© Emerald Publishing Limited
2514-9288
DOI 10.1108/DTA-11-2019-0208
committed by Enron, WorldCom, Qwest and Tyco are well known (Rezaee, 2005).
Recently, a late public statement by Hanmi Pharmaceutical Firm, a major Korean
pharmaceutical company, which is suspected of intentional, late posting of a public
statement after short selling a large volume of stock, resulted in over $3 million loss in
2016 (Kim, 2016;Lee, 2016).
FSF committed by a company implies the current status of the company is likely to be
unhealthy. Such companies tend to conceal bad information and usually show poor
performance prior to receiving forewarning notice from a public supervisor (Chung et al.,
2014). Additionally, stock returns tend to decline before and after firmsunfaithful behavior
(Han et al., 2014). The more unfaithful disclosures are connected to a company, the higher
interest rate of loan and the lower credit rating are applied to the company (Lee et al., 2008).
Hence, it is strongly required to detect FSF prior to receiving forewarning notice in order to
distinguish financially unhealthy companies from healthy ones.
Due to the crucial impact of FSF, there have been a lot of studies regarding the fraud.
Having started as earnings or accrual manipulation, FSF has been studied by many
researchers with regard to causes, effects and motivations, while there are some
researchers who focused on identifying factors which have significant effect on FSF (Rf.
Section 2.2).
Nowadays, applying data mining techniques to solving diverse financial issues became
imperative on account of its capability to mine knowledge out of a great number of instances.
Data mining techniques enable people to extract useful knowledge from a large dataset. In
accordance with such a trend, auditors, who are responsible for finding problematic firms,
had better utilize the techniques, since some limitations of human experts such as bias and
subjective judgment could be avoided by using the techniques (Ravisankar et al., 2011).
Besides, it is difficult for human experts to catch the time-varying importance of some
financial variables in determining FSF (Ngai et al., 2009). Hence, more recently, several
researchers tried to create classification models using data mining techniques to predict FSF
in advance (Rf. Section 2.3).
However, previous studies which utilized data mining techniques to detect FSF leave
something to be desired. First, it is worthwhile to expand the scope of FSF so as to include
more kinds of frauds, which will then mitigate the class imbalance problem. Note that most of
the previous studies mainly dealt with only two types of fraud, misstatement and
restatement. Second, previous studies focused on building a classification model showing
high performance, but it is also important to derive rules that can be used to explain the
classification result (Huang et al., 2014;Pai et al., 2011).
The objective of the paper is, therefore, twofold: (1) to build a classification model with
high performance to detect four types of FSF: misstatement, restatement, delayed disclosure
and cancelled disclosure of financial statement; (2) to extract human-readable rules, by which
we can explain why a company is likely to commit FSF. To that end, we used a joined dataset
of Korean companies obtained from Korea Investors Network for Disclosure (KIND) system
and a database from KIS-Value (Korea Investors Service-Value) and we modified the Random
Forest (RF) algorithm.
The rest of the paper is organized as follows. In Section 2, we examine the definitions
of several types of FSF and review previous works related to FSF. Section 3 describes
research method, including the dataset, financial ratio variables used to build a
classification model and modification of Random Forest algorithm to generate better
classification performance and human-readable rules. In Section 4, experimental results
are presented, including classification results, their statistical verification and examples
of rules. Finally, Section 5 concludes the paper with a summary, contributions and
future work.
DTA
54,2
236

To continue reading

Request your trial

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT