Title :
A Synthesized Data Mining Algorithm Based on Clustering and Decision Tree
Author :
Dan, Ji ; Jianlin, Qiu ; Xiang, Gu ; Li, Chen ; Peng, He
Author_Institution :
Sch. of Comput. Sci. & Technol., Nantong Univ., Nantong, China
fDate :
June 29 2010-July 1 2010
Abstract :
With the development of information technology and computer science, high-capacity data appear in our lives. In order to help people analyzing and digging out useful information, the generation and application of data mining technology seem so significance. Clustering and decision tree are the mostly used methods of data mining. Clustering can be used for describing and decision tree can be applied to analyzing. After combining these two methods effectively, we can reflect data characters and potential rules syllabify. This paper presents a new synthesized data mining algorithm named CA which improves the original methods of CURE and C4.5. CA introduces principle component analysis (PCA), grid partition and parallel processing which can achieve feature reduction and scale reduction for large-scale datasets. This paper applies CA algorithm to maize seed breeding and the results of experiments show that our approach is better than original methods.
Keywords :
data mining; decision trees; pattern clustering; principal component analysis; C4.5; CA algorithm; CURE; clustering; decision tree; feature reduction; grid partition; large-scale datasets; maize seed breeding; parallel processing; principle component analysis; scale reduction; synthesized data mining; Algorithm design and analysis; Classification algorithms; Classification tree analysis; Clustering algorithms; Ear; Partitioning algorithms; clustering; decision tree; maize seed breeding;
Conference_Titel :
Computer and Information Technology (CIT), 2010 IEEE 10th International Conference on
Conference_Location :
Bradford
Print_ISBN :
978-1-4244-7547-6
DOI :
10.1109/CIT.2010.456