Abstract :
This paper´s purpose is twofold: first it addresses the adequacy of some theoretical information criteria when using finite mixture modelling (unsupervised learning) on discovering patterns in continuous data; second, we aim to apply these models and BIC to discover patterns of coronary heart disease. The results were very good in order to encourage the use, both for mixture models and information criteria BIC use. Nevertheless the widespread application of finite mixture modelling, finite mixture model selection is still an important issue. In order to select among several information criteria, which may support the selection of the correct number of clusters, we conduct a simulation study, in order to determine which information criteria are more appropriate for mixture model selection when considering data sets with only continuous clustering base variables. As a result, the criterion BIC shows a better performance, that is, it indicates the correct number of the simulated cluster structures more often, when referring to mixtures of continuous clustering base variables. When applied to discover patterns of coronary heart disease, it performs well, discovering the known pattern of data.
Keywords :
cardiology; diseases; medical diagnostic computing; pattern clustering; unsupervised learning; coronary heart disease; data pattern; finite mixture modelling; unsupervised learning; Analytical models; Cardiac disease; Clustering algorithms; Computational modeling; Computer simulation; Humans; Information analysis; Probability distribution; Proposals; Unsupervised learning; Coronary Heart Disease; Finite Mixture Models; Patterns in Continuous Data; Quantitative Methods; Simulation Experiments; Theoretical Information Criteria; Unsupervised Learning;