MD-LDA: a supervised LDA topic model for identifying mechanism of disease in TCM
| Date | 22 July 2024 |
| Pages | 1-18 |
| DOI | https://doi.org/10.1108/DTA-12-2023-0868 |
| Published date | 22 July 2024 |
| Author | Meiwen Li,Liye Xia,Qingtao Wu,Lin Wang,Junlong Zhu,Mingchuan Zhang |
MD-LDA: a supervised LDA topic
model for identifying mechanism
of disease in TCM
Meiwen Li
School of Information Engineering, Henan University of Science and Technology,
Luoyang, China
Liye Xia
The First Affiliated Hospital, Henan University of Science and Technology,
Luoyang, China, and
Qingtao Wu, Lin Wang, Junlong Zhu and Mingchuan Zhang
School of Information Engineering, Henan University of Science and Technology,
Luoyang, China
Abstract
Purpose –In traditional Chinese medicine (TCM), the mechanism of disease (MD) constitutes an essential
element of syndrome differentiation and treatment, elucidating the mechanisms underlying the occurrence,
progression, alterations and outcomes of diseases. However, there is a dearth of research in the field of
intelligent diagnosis concerning the analysis of MD.
Design/methodology/approach –In this paper, we propose a supervised Latent Dirichlet Allocation (LDA)
topic model, termed MD-LDA, which elucidates the process of MDs identification. We leverage the label
information inherent in the data as prior knowledge and incorporate it into the model’s training. Additionally,
we devise two parallel parameter estimation algorithms for efficient training. Furthermore, we introduce a
benchmark MD identificationdataset,named TMD, for training MD-LDA. Finally, we validate the performance
of MD-LDA through comprehensive experiments.
Findings –The results show that MD-LDA is effective and efficient. Moreover, MD-LDA outperforms the
state-of-the-art topic models on perplexity, Kullback–Leibler (KL) and classification performance.
Originality/value–Theproposed MD-LDA can be applied for the MD discovery and analysis of TCM clinical
diagnosis, so as to improve the interpretability and reliability of intelligent diagnosis and treatment.
Keywords Mechanism of disease, Syndrome differentiation, Prior knowledge, Topic model, Supervised LDA,
Traditional Chinese medicine
Paper type Research paper
1. Introduction
Traditional Chinese medicine (TCM) has received significantly attention due to its miraculous
curative effect (Cheung, 2011;Huang et al., 2021). For instance, it plays an important role for
treating COVID-19 (Lyu et al., 2021;Sun et al., 2022). In TCM, the syndrome differentiation and
treatment (辨证论治)[1] is the most important characteristic. In this process, the syndrome
differentiation (辨证) means that the current diagnosis results of patients are represented
by various syndromes (证候), which can describe the dynamic evolution of the disease.
Then, the disease can be treated based on the syndrome differentiation according to the
Data
Technologies and
Applications
1
This work was funded in part by the National Natural Science Foundation of China (NSFC) under (No.
62002102), the Scientific and Technological Innovation Teams and Talents of Colleges and Universities
in Henan Province of China (Nos. 24IRTSTHN022 and 22HASTIT014) and the Key Technologies R&D
Program of Henan Province (Nos. 241111210700, 232102211008 and 232102210028).
Conflict of interest statement: The authors declare that they have no known competing financial
interests or personal relationships that could have appeared to influence the work reported in this paper.
The current issue and full text archive of this journal is available on Emerald Insight at:
https://www.emerald.com/insight/2514-9288.htm
Received 24 December 2023
Revised 5 May 2024
Accepted 26 May 2024
Data Technologies and
Applications
Vol. 59 No. 1, 2025
pp. 1-18
© Emerald Publishing Limited
2514-9288
DOI 10.1108/DTA-12-2023-0868
current condition of diseases (Li et al., 2022). In order to correctly identify the syndrome, a key
factor is the identification of mechanism of disease (MD, 病机), which reflects the individual
specificity and dynamic evolution of syndromes. Furthermore, MD consists of four diagnostic
symptoms (症状) and signs (体征) that are uniformly referred to as the MD unit, or syndrome
manifestation (SM) (Li et al., 2008). For this reason, the information of MD can be obtained by
identifying the MD unit. Thereby, how to identify the MD unit via four diagnostic information
of patients is very critical problem in TCM.
Identifying the MD unit is complex, ambiguous and uncertain since diseases are
dynamically changing in the human body. Therefore, such identification has necessitated
experienced TCM experts, which can quickly and accurately identify the MD units and
combine them to obtain the results of syndrome differentiation. Nevertheless, such competent
TCM experts are very scarce. Inspired by the success of artificial intelligence (AI) in a variety
of fields, it has been introduced to deal with MD analysis and syndrome differentiation in
recent years (Li, 2020;Li et al., 2021;Wen et al., 2021;Zhan et al., 2023). For instance, the
relationship between symptoms and MDs is found by using traditional machine learning
methods (Tang, 2021;Shi, 2021), which need to design features of TCM clinical data.
Nevertheless, designing such features is a time-consuming, expensive, expert-oriented task.
Moreover, a unified design standard is lacking in TCM. To address these issues, deep
learning is introduced for syndrome differentiation (Liang et al., 2020;Pang et al., 2020;Zhao
et al., 2023), which lacks reasoning ability and interpretability since the domain knowledge of
TCM is not well integrated in the process of MDanalysis. Nonetheless, the domain knowledge
pays a pivotal role in syndrome differentiation. Besides, the above-mentioned efforts were
made for a single disease, which leads to poor performance of generalization (Zhang, 2023).
Hence, how to use the clinical data of TCM with domain knowledge to devise a general MD
analysis and syndrome differentiation model is still a challenging task.
Inspired by the superior performance of the Latent Dirichlet Allocation (LDA) topic model
(David et al., 2003) in exploration data analysis, we resort it to design the MD analysis and
syndrome differentiation model (David et al., 2003). LDA is a generative model based on
Bayesian framework, where the topic distribution is extracted over words in a collection of
documents (Jelodar et al., 2019). One advantage is that the LDA topic model ignores the order
of words in documents and does not limit the length of each document since it is a bag-of-
words model (Uttam and Apurva, 2021). While in TCM, all clinical records can be regarded as
a“corpus”, a record is viewed as a “document”, each symptom in the medical record is
regarded as a “word”, and MDs are deemed as “topics”. In spite of complex TCM domain,
relations among disease cause (病因), location of disease (病位), MDs, diagnostic results
(syndromes) and treatment plan can be expressed by LDA. Meanwhile, LDA can easily
incorporate domain knowledge and retain the good interpretability. Thereby, the LDA topic
model can tell how to identify MDs through inferring these relations. Moreover, prior
knowledge can be incorporated into the LDA topic model to improve the performance of
identification. For these reasons, LDA is widely applied in TCM in recent years (Yao et al.,
2018;Ma et al., 2022). Despite this progress, however, the aforementioned models fail to
identify MDs, which are very crucial in TCM. Therefore, how to incorporate potential prior
knowledge into the LDA topic model for identifying MDs still remains an open problem.
Aspiring to fill the gap, this paper develops an LDA topic model that characterizes the
analyzed processof MD, which is the basis for accurate syndrome differentiation. To improve
the identification performance of MDs, prior knowledge is further incorporated into the
proposed topic model.In addition, we construct a universal dataset forthe study of intelligent
TCM syndrome differentiation,called TSD. It contains more than 40,000 text data from TCM
clinical records, whichare labeled in detail with diseases, syndromes and MDs. TSD contains
1,870 syndromes, 30 MD units and morethan 500 symptoms. These data can well support the
research in relatedfields of intelligent syndrome differentiation.To the best of our knowledge,
DTA
59,1
2
Get this document and AI-powered insights with a free trial of vLex and Vincent AI
Get Started for FreeStart Your Free Trial of vLex and Vincent AI, Your Precision-Engineered Legal Assistant
-
Access comprehensive legal content with no limitations across vLex's unparalleled global legal database
-
Build stronger arguments with verified citations and CERT citator that tracks case history and precedential strength
-
Transform your legal research from hours to minutes with Vincent AI's intelligent search and analysis capabilities
-
Elevate your practice by focusing your expertise where it matters most while Vincent handles the heavy lifting
Start Your Free Trial of vLex and Vincent AI, Your Precision-Engineered Legal Assistant
-
Access comprehensive legal content with no limitations across vLex's unparalleled global legal database
-
Build stronger arguments with verified citations and CERT citator that tracks case history and precedential strength
-
Transform your legal research from hours to minutes with Vincent AI's intelligent search and analysis capabilities
-
Elevate your practice by focusing your expertise where it matters most while Vincent handles the heavy lifting
Start Your Free Trial of vLex and Vincent AI, Your Precision-Engineered Legal Assistant
-
Access comprehensive legal content with no limitations across vLex's unparalleled global legal database
-
Build stronger arguments with verified citations and CERT citator that tracks case history and precedential strength
-
Transform your legal research from hours to minutes with Vincent AI's intelligent search and analysis capabilities
-
Elevate your practice by focusing your expertise where it matters most while Vincent handles the heavy lifting
Start Your Free Trial of vLex and Vincent AI, Your Precision-Engineered Legal Assistant
-
Access comprehensive legal content with no limitations across vLex's unparalleled global legal database
-
Build stronger arguments with verified citations and CERT citator that tracks case history and precedential strength
-
Transform your legal research from hours to minutes with Vincent AI's intelligent search and analysis capabilities
-
Elevate your practice by focusing your expertise where it matters most while Vincent handles the heavy lifting
Start Your Free Trial of vLex and Vincent AI, Your Precision-Engineered Legal Assistant
-
Access comprehensive legal content with no limitations across vLex's unparalleled global legal database
-
Build stronger arguments with verified citations and CERT citator that tracks case history and precedential strength
-
Transform your legal research from hours to minutes with Vincent AI's intelligent search and analysis capabilities
-
Elevate your practice by focusing your expertise where it matters most while Vincent handles the heavy lifting