Predicting corporate credit rating based on qualitative information of MD&A transformed using document vectorization techniques

Publication Date13 March 2020
Date13 March 2020
AuthorJinwook Choi,Yongmoo Suh,Namchul Jung
SubjectLibrary & information science,Librarianship/library management,Library technology,Information behaviour & retrieval,Metadata,Information & knowledge management,Information & communications technology,Internet
Predicting corporate credit rating
based on qualitative information of
MD&A transformed using
document vectorization techniques
Jinwook Choi and Yongmoo Suh
Korea University Business School, Seoul, Republic of Korea, and
Namchul Jung
School of Business Administration, Hongik University, Seoul, Republic of Korea
Purpose The purpose of this study is to investigate the effectiveness of qualitative information extracted
from firms annual report in predicting corporate credit rating. Qualitative information represented by
published reports or management interview has been known as an important source in addition to quantitative
information represented by financial values in assigning corporate credit rating in practice.Nevertheless, prior
studies have room for further research in that they rarely employed qualitative information in developing
prediction model of corporate credit rating.
Design/methodology/approach This study adopted three document vectorization methods, Bag-Of-
Words (BOW), Word to Vector (Word2Vec) and Document to Vector (Doc2Vec), to transform an unstructured
textual data into a numeric vector, so that Machine Learning (ML) algorithms accept it as an input. For the
experiments, we used the corpus of Managements Discussion and Analysis (MD&A) section in 10-K financial
reports as well as financial variables and corporate credit rating data.
Findings Experimental results from a series of multi-class classification experiments show the predictive
models trained by both financial variables and vectors extracted from MD&A data outperform the benchmark
models trained only by traditional financial variables.
Originality/value This study proposed a new approach for corporate credit rating prediction by using
qualitative information extracted from MD&A documents as an input to ML-based prediction models. Also,
this research adopted and compared three textual vectorization methods in the domain of corporate credit
rating prediction and showed that BOW mostly outperformed Word2Vec and Doc2Vec.
Keywords Corporate credit rating, Qualitative information, MD&A, Document vectorization, Machine
learning, Predictive model
Paper type Research paper
1. Introduction
Credit rating provided by bond rating agencies[1] is an opinion about credit quality of bond
issuers. Roles of evaluating credit rating include valuation and contract facilitation (Frost,
2007). The former is to disseminate information about the default risk or creditworthiness of
bond issuers to capital market participants, thereby helping their decision-making about it.
The latter is to facilitate contracts between bond investors and issuers by reducing
information asymmetry related to the credit risk of borrowers. As a result, credit rating has
been an important benchmark for issuers to reduce the cost of capital and for investors to
avoid default risk of their investees, as well as for regulatory bodies to achieve regulatory
objectives such as determining rating-based criteria. Thus, it is not surprising that a bunch of
studies on corporate credit rating prediction have been actively conducted in academia.
Research on predicting credit rating is important for the following reasons. First,
predicting credit rating could provide an early warning of financial distress of firms (Hajek
and Michalak, 2013). Second, since ratings by bond rating agencies may not reflect default
credit rating
This research was supported by the Korea Univeristy Business School Research Grant.
The current issue and full text archive of this journal is available on Emerald Insight at:
Received 4 August 2019
Revised 24 December 2019
Accepted 13 January 2020
Data Technologies and
Vol. 54 No. 2, 2020
pp. 151-168
© Emerald Publishing Limited
DOI 10.1108/DTA-08-2019-0127
risk in a timely manner (Kim and Ahn, 2012), it is necessary for lenders such as financial
institutions to estimate the credit rating of borrowers independently. Third, to assess and
update a credit rating through rating agencies are very costly, because agencies require
considerable time and effort to perform in-depth analysis of the company (Huang et al., 2004).
Machine Learning (ML) algorithms have been the primary methods to develop prediction
models for corporate credit rating. Although early studies of this issue had concentrated on
statistical models such as linear regression, ML algorithms such as Artificial Neural
Networks (ANN) and Support Vector Machines (SVM) emerged as a new solution to corporate
credit rating prediction, because ML algorithms showed better performance than traditional
statistical methods (Chen and Shih, 2006;Huang et al., 2004;Lee, 2007). All ML algorithms
build prediction models using training dataset. The more relevant information of high quality
the training data includes, the better the performance of a predictive model is. Thus, it could
be crucial to decide what kinds of information to use as an input to ML algorithms.
It is generally known that credit rating process takes into consideration both financial
risk (e.g. financial characteristics, capital structure and financial liquidity) and business
risk (e.g. industry characteristics, management integr ity, firms strategic position and
competitiveness). Therefore, to predict a corporate credit rating using both qualitative
information representing business risk and quantitative information representing
financial risk would be meaningful for the following reasons. First, bond rating
agencies such as S&P practically use model-driven ratings obtained from various
information sources such as published reports and management interview as well as
model-driven ratings based on mathematical works (Standard and Poors, 2018). Second,
while credit rating is forward looking aspect, most of quantitative financial data are
backward looking. As such, rating only based on financial data may need adjustments by
domain experts, which is a subjective judgement. Third, financial data from accounting
numbers generated under generally accepted accounting principle (GAAP) may not fully
reflect the economic circumstances, which firms face.
Nevertheless, earlier studies on identifying the determinants of corporate credit rating
mainly focused on quantitative structured information (hard facts), that is, financial values
from financial statements (Horrigan, 1966;Kaplan and Urwitz, 1979;West, 1970). Recently, a
few researches have attempted to consider qualitative information so-called soft facts, which
refers to unstructured disclosures included in firmsannual report or non-financial factors
(Bonsall and Miller, 2017;Bozanic et al., 2018;Bozanic and Kraft, 2014;Lehmann, 2003).
However, those studies have room for further research in that they just examined the
association between variables extracted from qualitative information and corporate credit
rating. Therefore, it would complement existing studies to explore the effectiveness of
qualitative information as an input feature to the prediction model of corporate credit rating.
Using features about soft facts when building prediction models might be more appropriate,
since they are used when determining credit rating in practice.
In this study, the authors propose a novel approach to predicting corporate credit rating,
which takes advantage of qualitative information extracted from firms annual report.
Specifically, the proposed method makes use of Managements Discussion and Analysis
(MD&A) section in 10-K financial reports required by Securities and Exchange Commission
(SEC). We employ Bag-of-Words (BOW), Word to Vector (Word2Vec) and Document to
Vector (Doc2Vec) to transform an unstructured textual data into a numeric vector. They
examine the usefulness of qualitative information extracted from the MD&A document in
predicting corporate credit rating and also conduct several experiments under special
conditions to scrutinize whether using both quantitative and qualitative information could
enhance the performance of a prediction model.
The remainder of the paper is structured as follows. Section 2 reviews the previous
literature relevant to corporate credit rating prediction and MD&A section of 10-K report.

To continue reading

Request your trial

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT