Title :
Using unsupervised learning to determine risk level for left ventricular diastolic dysfunction
Author :
Kaidi Ma ; Canepa, Marco ; Strait, James B. ; Shatkay, Hagit
Author_Institution :
Dept. of Comput. & Inf. Sci., Univ. of Delaware, Newark, DE, USA
Abstract :
Left Ventricular Diastolic Dysfunction (LVDD) is a decompensatory change in the relaxation properties of the heart, the risk for which increases with age. Currently, physicians use a decision-tree-like algorithm to distinguish between discrete LVDD levels. This approach, based on cut-off thresholds, can potentially lead to information loss and possibly to misdiagnosis. This paper aims to explore an alternative diagnostic method to determine LVDD risk level, taking into account a wide variety of attributes available in patient records, without pre-setting cut-off thresholds. Using a large dataset derived from the Baltimore Longitude Study of Aging (BLSA), and adjusting the data for age and gender, we employ the Chi Square test and the information gain criterion to identify attributes that correlate well with the physician-assigned grades; such attributes are referred to as distinguishing attributes. We then apply the expectation maximization (EM) algorithm, as well as the K-Means, in order to cluster records that are represented using distinguishing attributes. While clusters resulting from the K-Means are not stable, three stable and tightly-formed clusters, which are obtained from the EM algorithm, roughly correspond to the physician-assigned categories. Based on the results from the EM algorithm, we can compute a patient´s probability to have low, high or no risk for LVDD, and use this probability as a basis for defining a risk score to determine the patient´s LVDD severity.
Keywords :
cardiology; decision trees; diseases; electronic health records; expectation-maximisation algorithm; medical diagnostic computing; pattern clustering; probability; risk management; unsupervised learning; Baltimore Longitude Study-of-Aging; Chi Square test; K-Means; age data; alternative diagnostic method; cut-off thresholds; dataset; decision-tree-like algorithm; decompensatory change; discrete LVDD risk level; distinguishing attributes; expectation maximization algorithm; gender data; heart; information gain criterion; information loss; left ventricular diastolic dysfunction; patient probability; patient records; physician-assigned grades; record clustering; relaxation properties; risk score; tightly-formed clusters; unsupervised learning; Aging; Blood flow; Clustering algorithms; Doppler effect; Filling; Heart; Medical services; EM algorithm; clustering; left ventricle diastolic dysfunction; unsupervised learning;
Conference_Titel :
Bioinformatics and Biomedicine (BIBM), 2014 IEEE International Conference on
Conference_Location :
Belfast
DOI :
10.1109/BIBM.2014.6999182