Age-specific survival in prostate cancer using machine learning

Publication Date13 May 2020
Date13 May 2020
AuthorM.N. Doja,Ishleen Kaur,Tanvir Ahmad
SubjectLibrary & information science,Librarianship/library management,Library technology,Information behaviour & retrieval,Metadata,Information & knowledge management,Information & communications technology,Internet
Age-specific survival in prostate
cancer using machine learning
M.N. Doja
Indian Institute of Information Technology, Sonepat, India, and
Ishleen Kaur and Tanvir Ahmad
Department of Computer Engineering, Jamia Millia Islamia, New Delhi, India
Purpose The incidence of prostate cancer is increasing from the past few decades. Various studies have tried
to determine the survival of patients, but metastatic prostate cancer is still not extensively explored. The
survival rate of metastatic prostate cancer is very less compared to the earlier stages. The study aims to
investigate the survivability of metastatic prostate cancer based on the age group to which a patient belongs,
and the difference between the significance of the attributes for different age groups.
Design/methodology/approach Data of metastatic prostate cancer patients was collected from a cancer
hospital in India. Two predictive models were built for the analysis-one for the complete dataset, and the other
for separate age groups. Machine learning was applied to both the models and their accuracies were compared
for the analysis. Also, information gain for each model has been evaluated to determine the significant
predictors for each age group.
Findings The ensemble approach gave the best results of 81.4% for the complete dataset, and thus wasused
for the age-specific models. The results concluded that the age-specific model had the direct average accuracy
of 83.74% and weighted average accuracy of 79.9%, with the highest accuracy levels for age less than 60.
Originality/value The study developed a model that predicts the survival of metastatic prostate cancer
based on age. The study will be able to assist the clinicians in determining the best course of treatment for each
patient based on ECOG, age and comorbidities.
Keywords Prostatecancer, Metastasis,Medical, Machine learning,Survival prediction,Data mining, Ensemble
Paper type Research paper
1. Introduction
From the past decades, cancer is one of the most leading causes of death worldwide.
According to GLOBOCAN (2018), there were 18.1 million new cases of cancer and 9.6 million
deaths due to cancer in 2018. Prostate cancer is the second most occurring cancer in men in
both developing and developed countries, while it accounts for 1.3 million new cases, the
fourth most commonly diagnosed cancer overall (GLOBOCAN, 2018). It more commonly
occurs in developed countries like North America, Australia, New Zealand and Western and
Northern Europe (Risk Factors, 2017). In developing countries like India also, the incidence
rate of prostate cancer is increasing from past years, and more commonly in the urban
population (Hariharan and Padmanabha, 2016).
Machine learning techniques have been used widely in the past decades by researchers for
diagnosis (Ali
c and Subasi, 2015;Wang and Wang, 2017), survival prediction (Jajroudi
et al., 2014;Park et al., 2013;Cheng-Min Chao et al., 2014;Walczak and Velanovich, 2018) and
recurrence prediction (Andjelkovic Cirkovic et al., 2015;Tseng et al., 2017) in different cancers.
In this study, we focus only on the survivability of cancer. Survival is difficult to be estimated
due to various environmental, genetic and biological factors. The survival of the patients
after diagnosed with cancer is essential for making an informed decision to the patient, as well
as for determining the best treatment possible for the patient. Due to the expensive treatment
survival in
prostate cancer
The authors would like to thank Rajiv Gandhi Cancer Institute and Research Center, New Delhi for
providing the opportunity and environment to carry out the data collection process in their premises. We
are also thankful to the urology team of RGCIRC to cooperate with us and giving valuable suggestions in
the process.
The current issue and full text archive of this journal is available on Emerald Insight at:
Received 20 October 2019
Revised 1 February 2020
26 March 2020
Accepted 8 April 2020
Data Technologies and
Vol. 54 No. 2, 2020
pp. 215-234
© Emerald Publishing Limited
DOI 10.1108/DTA-10-2019-0189
procedures for cancer, survival prediction is essential. Earlier, clinicians used to estimate the
survivability of the patient based on their experience. This estimation may be biased due to
their limited experience. The prediction modeling techniques have proved to perform better in
survival prediction than clinical opinions (Ross et al., 2002)(Walz et al., 2007).
Data mining has played an essential role in various areas of medical research. Cancer
research is one such area. The availability of some online databases has allowed the
researchers to create models that assist the clinicians in better treatments for the patients.
SEER database (SEER database) is one such example of an online database that covers the
maximum number of cancers and patients. However, some studies have used the hospital-
based datasets also to analyze the research and predict the survival on a regional basis
(Tseng et al., 2015). Also, not all types of cancers are entirely covered in online databases.
Bray et al. (2018) also analyzed the incidence of thirty-six cancers in different countries, and
the researchers do not explore most of them due to the nonavailability of data.
There have been many studies that analyze the survival of prostate cancer based on
various factors like treatment (Teoh et al., 2018), Gleason score (Rusthoven et al., 2014),
metastasis (Pond et al., 2014), but machine learning has not been explored much in prostate
cancer survival. The Kaplan-Meir and Cox proportional regression methods are the
traditional statistical techniques for the survival analysis, but machine learning techniques
give an insight into the survival, considering the various attributes of the patients. It does so
by first training the model using a part of the dataset and then validating and testing the
classifier built from the training. Nezhad et al. (2019) applied deep learning combined with
active learning for the survival prediction in prostate cancer, where the data was collected
from the SEER database. The proposed approach by the authors gave better results than
other baseline machine learning algorithms.
This study is an attempt to predict the survivability in metastatic prostate cancer, keeping
in mind the age group to which a patient belongs. The TNM staging is determined for the
patient after being diagnosed with prostate cancer. The metastatic stage is the advanced
stage of prostate cancer, where the cancer cells spread to distant organs. Although a very
small percentage of the men diagnosed with prostate cancer are metastatic, the survival in the
metastatic stage is essential for the patient, as well as the oncologists, as the survival rate is
meager in this stage. The 5-year survival rate is almost 30% for Stage 4 prostate cancer
patients (SEER database). But the SEER dataset covers only American patients, and the
survival rate is higher in Europe and North America, and comparatively less in Asian and
African countries (Bray et al., 2018).
It hasbeen proved in the previousstudies that age has a significantrole in the survivalof the
patient (Moreira et al.,2017). This study tries to use the fact and thus divide the dataset into
clusters ofpatients sharing the same age group.The oncologists also tend to considerage as a
significantfactor whiledeciding the appropriatetreatment a patientcan bear. Prostate canceris
also more commonin older patients. Not onlyincidence, but the mortalityrate also depends on
age in almost every disease. Thus, we have used age as a deciding factor, instead of just an
attributefor the survival prediction.The oncologists canthen evaluate the survivalof a patient
basedon the age group that a patient belongsto. This study can be further extendedto different
ailments where agecan play a crucial part in the survivability of the patients.
The rest of the paper is organized as follows: Section 2 discusses the related works done by
the researchers. Section 3 explains the dataset and methodology used in the study, while the
results are presented in section 4. Last, the conclusion of the study is given in section 5.
2. Related work
As already discussed in the previous section, there have not been many studies exploring the
survival prediction in prostate cancer. The literature survey of the research thus covers two
aspects: survival prediction using machine learning and prostate cancer survival analysis.

To continue reading

Request your trial

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT