Title :
A novel mixed values k-prototypes algorithm with application to health care databases mining
Author :
Najjar, Ahmed ; Gagne, Christian ; Reinharz, Daniel
Author_Institution :
Dept. de genie Electr. et de genie Inf., Univ. Laval, Quebec City, QC, Canada
Abstract :
The current availability of large datasets composed of heterogeneous objects stresses the importance of large-scale clustering of mixed complex items. Several algorithms have been developed for mixed datasets composed of numerical and categorical variables, a well-known algorithm being the k-prototypes. This algorithm is efficient for clustering large datasets given its linear complexity. However, many fields are handling more complex data, for example variable-size sets of categorical values mixed with numerical and categorical values, which cannot be processed as is by the k-prototypes algorithm. We are proposing a variation of the k-prototypes clustering algorithm that can handle these complex entities, by using a bag-of-words representation for the multivalued categorical variables. We evaluate our approach on a real-world application to the clustering of administrative health care databases in Quebec, with results illustrating the good performances of our method.
Keywords :
computational complexity; data mining; database management systems; medical information systems; pattern clustering; Quebec; bag-of-words representation; health care databases mining; heterogeneous objects; k-prototypes clustering algorithm; large-scale clustering; linear complexity; mixed complex items; mixed values k-prototypes algorithm; valued categorical variables; variable-size categorical values sets; Clustering algorithms; Complexity theory; Equations; Medical services;
Conference_Titel :
Computational Intelligence in Healthcare and e-health (CICARE), 2014 IEEE Symposium on
Conference_Location :
Orlando, FL
DOI :
10.1109/CICARE.2014.7007849