An improved density-based approach to risk assessment on railway investment

DOIhttps://doi.org/10.1108/DTA-11-2020-0291
Published date01 November 2021
Date01 November 2021
Pages382-408
Subject MatterLibrary & information science,Librarianship/library management,Library technology,Information behaviour & retrieval,Metadata,Information & knowledge management,Information & communications technology,Internet
AuthorJingwei Guo,Ji Zhang,Yongxiang Zhang,Peijuan Xu,Lutian Li,Zhongqi Xie,Qinglin Li
An improved density-based
approach to risk assessment on
railway investment
Jingwei Guo and Ji Zhang
Henan Polytechnic University, Jiaozuo, China
Yongxiang Zhang
Southwest Jiaotong University, Chengdu, China
Peijuan Xu
Changan University, Xian, China
Lutian Li and Zhongqi Xie
Henan Polytechnic University, Jiaozuo, China, and
Qinglin Li
Southwest Jiaotong University, Chengdu, China
Abstract
Purpose Density-based spatial clustering of applications with noise (DBSCAN) is the most commonly used
density-based clustering algorithm, while it cannot be directly applied to the railway investment risk
assessment. To overcome the shortcomingsof calculation method and parameter limits of DBSCAN, this paper
proposes a new algorithm called Improved Multiple Density-based Spatial clustering of Applications with
Noise (IM-DBSCAN) based on the DBSCAN and rough set theory.
Design/methodology/approach First, the authors develop an improved affinity propagation (AP)
algorithm, which is then combined with the DBSCAN (hereinafter referred to as AP-DBSCAN for short) to
improve the parameter setting and efficiency of the DBSCAN. Second, the IM-DBSCAN algorithm, which
consists of the AP-DBSCAN and a modified rough set, is designed to investigate the railway investment risk.
Finally, the IM-DBSCAN algorithm is tested on the ChinaLaos railways investment risk assessment, and its
performance is compared with other related algorithms.
Findings The IM-DBSCAN algorithm is implemented on ChinaLaos railways investment risk assessment
and compares with other related algorithms. The clustering results validate that the AP-DBSCAN algorithm is
feasible and efficient in terms of clustering accuracy and operating time. In addition, the experimental results
also indicate that the IM-DBSCAN algorithm can be used as an effective method for the prospective risk
assessment in railway investment.
Originality/value This study proposes IM-DBSCAN algorithm that consists of the AP-DBSCAN and a
modified roughset to study the railway investmentrisk. Different from the existingclustering algorithms,AP-
DBSCANput forward the density calculationmethod to simplifythe process of optimizingDBSCAN parameters.
Insteadof using Euclidean distance approach,the cutoff distance methodis introduced to improve the similarity
measurefor optimizing theparameters. The developedAP-DBSCAN is usedto classify the ChinaLaosrailways
investmentrisk indicators more accurately.Combined with a modified rough set, the IM-DBSCANalgorithm is
proposedto analyze the railway investmentrisk assessment. Thecontributions of this study can be summarized
as follows:(1) Based on AP, DBSCAN, an integrated methodologyAP-DBSCAN, whichconsiders improving the
parameter settingand efficiency, is proposed to classifyrailway risk indicators. (2) As AP-DBSCAN is a risk
classification model rather than a risk calculation model, an IM-DBSCAN algorithm that consists of the AP-
DBSCANand a modified rough set is proposed to assessthe railway investment risk.(3) Taking the ChinaLaos
railway as a real-lifecase study, the effectiveness and superiority of the proposed IM-DBSCAN algorithm are
verified througha set of experiments comparedwith other state-of-the-art algorithms.
Keywords Density-based clustering, Affinity propagation algorithm, Rough set, Risk assessment,
Railway investment
Paper type Research paper
DTA
56,3
382
This work was supported by the National Natural Science Foundation of China (No. 61803147), in part
by the Key Scientific and Technological Project of Henan Province (No. 182102310799).
The current issue and full text archive of this journal is available on Emerald Insight at:
https://www.emerald.com/insight/2514-9288.htm
Received 26 November 2020
Revised 22 April 2021
27 August 2021
Accepted 2 October 2021
Data Technologies and
Applications
Vol. 56 No. 3, 2022
pp. 382-408
© Emerald Publishing Limited
2514-9288
DOI 10.1108/DTA-11-2020-0291
1. Introduction
With the Globalization Strategy of Chinas High-Speed Railway, there are more and more
opportunities for Chinese railway companies to participate in overseas railway construction
projects. However, Railway construction projects involve large amounts of funds and
management, as well as a lot of investment risks. Therefore, clustering methods are generally
adopted to analyze risks with uncertainty and instability characteristics, and classification of
the risk factors has also become a common technique.
As a typical data mining algorithm, clustering shows broad applications in data analysis.
Several different clustering approaches have been broadly introduced in the literature. For
instance, algorithms such as K-means (Hartigan and Wong, 1979) and Clustering Large
Applications based on Randomized Search (CLARANS) (Ng and Han, 2002) were designed
based on a partitioning approach; Gaussian mixture models (Fraley and Raftery, 2002) and
COBWEB (Fisher, 1987) belong to a model-based approach; Divisive Analysis (DIANA)
(Kaufman and Rousseeuw, 1990) and Balanced Iterative Reducing and Clustering using
Hierarchies (BIRCH) (Zhang et al., 1996) were developed based on a hierarchical approach;
Statistical Information Grid (STING) (Wang et al., 1997) and Clustering in Quest (CLIQUE)
(Agrawal et al., 1998) were introduced as a grid-based approach; and Density-Based Spatial
Clustering of Applications with Noise (DBSCAN) (Ester et al., 1996) and its variant Ordering
Points to Identify the Clustering Structure (OPTICS) (Ankerst et al., 1999) are examples of a
density-based approaches. Due to the fact that density-based clustering returns clusters of an
arbitrary shape, is robust to noise and does not require prior knowledge on the number of
clusters, DBSCAN has been widely applied in numerous fields (Li et al., 2020;Sabo and
Scitovski, 2020a;Gan and Tao, 2018).
However, DBSCAN does not work well for large datasets due to its high time complexity,
and it is necessary to calculate densities for most data points one by one, which limits its
performance. Therefore, many studies have been conducted to improve its performance, such
as Fast DBSCAN (Gunawan, 2013;Berg et al., 2019), Rough-DBSCAN (Viswanath and Babu,
2009), TI-DBSCAN (Kryszkiewicz and Lasek, 2010) and BLOCK-DBSCAN (Chen et al., 2021)
and so on. Specifically, Fast DBSCAN divides the data space into grid cells that only works in
a two-dimensional space. Based on eigenvalue decomposition of Laplacian matrix of the
weighted graph, Rough-DBSCAN finds the partition by distinguishing the edge weights of
different groups. Instead of spatial indices, TI-DBSCAN uses the triangle inequality property
to quickly reduce the neighborhood search space, and its performance significantly decreases
for both the low and high dimensional data. Unlike grid technique used in Fast-DBSCAN and
ρ
-approximate DBSCAN, BLOCK-DBSCAN uses norm ball and fast approximate algorithm
to accelerate the process of density computations. Despite the significant advantages in large-
scale data clustering, the variants of DBSCAN have a very poor stability and adaptability
performance for the reason that all elements of the dataset are dealt with an undefined
parameter setting and distance measurement. These methods also have the same defects as
other clustering algorithms in the risk classification, that is, the risk indicators are difficult to
be classified or a high level of uncertainty is involved in the risk indicators. Risk indicators are
now covering varying aspects of investment environment, political literacy, railway
operation and design and so on. The clustering performance that depends on the specific
parameter settings determines whether the risk indicators category system is reliable or not.
To compensate for the shortcomings of the existing research, this study proposes an
Improved MultipleDensity-based SpatialClustering of Applications withNoise (IM-DBSCAN)
to study the railway investment risk. In this study, we develop a two-stage approach for the
railway investment risk assessment, that is, risk indicator classification and risk calculation.
Firstly, combined with the affinity propagation (AP) algorithm, the AP-DBSCAN is put
forward, and the risk indicators category system is designed to obtain optimal clustering
results. Subsequently, to overcomethe challenge that the AP-DBSCAN may not be a complete
An improved
density-based
approach
383

To continue reading

Request your trial

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT