Title :
A Novel Variable-order Markov Model for Clustering Categorical Sequences
Author :
Tengke Xiong ; Shengrui Wang ; Qingshan Jiang ; Huang, Joshua Zhexue
Author_Institution :
Shenzhen Inst. of Adv. Technol., Shenzhen, China
Abstract :
Clustering categorical sequences is an important and difficult data mining task. Despite recent efforts, the challenge remains, due to the lack of an inherently meaningful measure of pairwise similarity. In this paper, we propose a novel variable-order Markov framework, named weighted conditional probability distribution (WCPD), to model clusters of categorical sequences. We propose an efficient and effective approach to solve the challenging problem of model initialization. To initialize the WCPD model, we propose to use a first-order Markov model built on a weighted fuzzy indicator vector representation of categorical sequences, which we call the WFI Markov model. Based on a cascade optimization framework that combines the WCPD and WFI models, we design a new divisive hierarchical clustering algorithm for clustering categorical sequences. Experimental results on data sets from three different domains demonstrate the promising performance of our models and clustering algorithm.
Keywords :
Markov processes; data mining; fuzzy set theory; optimisation; pattern clustering; WCPD; WFI Markov model; cascade optimization framework; clustering algorithm; clustering categorical sequences; data mining; novel variable-order Markov model; weighted conditional probability distribution; weighted fuzzy indicator vector representation; Clustering algorithms; Data models; Hidden Markov models; Markov processes; Numerical models; Probability; Silicon; Clustering; Computing Methodologies; Data mining; Database Applications; Database Management; Information Technology and Systems; Models; Pattern Recognition; Statistical; Statistical model; categorical sequence; clustering; similarity measure;
Journal_Title :
Knowledge and Data Engineering, IEEE Transactions on
DOI :
10.1109/TKDE.2013.104