DocumentCode :
2889159
Title :
Efficient Classification Method for Large Dataset
Author :
Jiang, Sheng-Yi
Author_Institution :
Sch. of Informatics, GuangDong Univ. of Foreign Studies, Guangzhou
fYear :
2006
fDate :
13-16 Aug. 2006
Firstpage :
1190
Lastpage :
1194
Abstract :
Clustering analysis is used to explore the classification for large dataset and Canberra distance is generalized so that it can process the data with categorical attributes. Based on the generalized Canberra distance definition, an instance of constraint-based clustering is introduced. Meanwhile, the nearest neighbor classification is improved. Class-labeled clusters are regarded as classifying models used for classifying data. The proposed classification method can discover the data of big difference from the instances in training data, which may mean a new data type. The theoretic analysis shows that the learning process and the classifying process of the proposed classification method has nearly linear time complexity, which makes the method results in good scalability and applicable to large dataset. The experimental results demonstrate that our method is effective and practicable, and has high prediction accuracy
Keywords :
computational complexity; pattern classification; pattern clustering; very large databases; Canberra distance definition; classification method; clustering analysis; large dataset; learning process; linear time complexity; Accuracy; Animals; Bayesian methods; Classification tree analysis; Cybernetics; Data analysis; Informatics; Machine learning; Measurement units; Nearest neighbor searches; Scalability; Training data; Canberra Distance; Classification; Clustering; Improved Nearest Neighbor;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Machine Learning and Cybernetics, 2006 International Conference on
Conference_Location :
Dalian, China
Print_ISBN :
1-4244-0061-9
Type :
conf
DOI :
10.1109/ICMLC.2006.258603
Filename :
4028244
Link To Document :
بازگشت