DocumentCode :
837612
Title :
A supervised clustering and classification algorithm for mining data with mixed variables
Author :
Li, Xiangyang ; Ye, Nong
Author_Institution :
Dept. of Ind. & Manuf. Syst. Eng., Univ. of Michigan, Dearborn, MI, USA
Volume :
36
Issue :
2
fYear :
2006
fDate :
3/1/2006 12:00:00 AM
Firstpage :
396
Lastpage :
406
Abstract :
This paper presents a data mining algorithm based on supervised clustering to learn data patterns and use these patterns for data classification. This algorithm enables a scalable incremental learning of patterns from data with both numeric and nominal variables. Two different methods of combining numeric and nominal variables in calculating the distance between clusters are investigated. In one method, separate distance measures are calculated for numeric and nominal variables, respectively, and are then combined into an overall distance measure. In another method, nominal variables are converted into numeric variables, and then a distance measure is calculated using all variables. We analyze the computational complexity, and thus, the scalability, of the algorithm, and test its performance on a number of data sets from various application domains. The prediction accuracy and reliability of the algorithm are analyzed, tested, and compared with those of several other data mining algorithms.
Keywords :
computational complexity; data mining; learning (artificial intelligence); computational complexity; data classification algorithm; data mining algorithm; data pattern learning; mixed variables; nominal variables conversion; numeric variables; scalable incremental learning; separate distance measures; supervised clustering algorithm; Algorithm design and analysis; Application software; Classification algorithms; Clustering algorithms; Data mining; Intrusion detection; Military computing; Partitioning algorithms; Scalability; Testing; Classification; clustering; computer intrusion detection; dissimilarity measures;
fLanguage :
English
Journal_Title :
Systems, Man and Cybernetics, Part A: Systems and Humans, IEEE Transactions on
Publisher :
ieee
ISSN :
1083-4427
Type :
jour
DOI :
10.1109/TSMCA.2005.853501
Filename :
1597409
Link To Document :
بازگشت