A Synthesized Data Mining Algorithm Based on Clustering and Decision Tree

Author

Dan, Ji ; Jianlin, Qiu ; Xiang, Gu ; Li, Chen ; Peng, He

Author_Institution

Sch. of Comput. Sci. & Technol., Nantong Univ., Nantong, China

fYear

2010

fDate

June 29 2010-July 1 2010

Firstpage

2722

Lastpage

2728

Abstract

With the development of information technology and computer science, high-capacity data appear in our lives. In order to help people analyzing and digging out useful information, the generation and application of data mining technology seem so significance. Clustering and decision tree are the mostly used methods of data mining. Clustering can be used for describing and decision tree can be applied to analyzing. After combining these two methods effectively, we can reflect data characters and potential rules syllabify. This paper presents a new synthesized data mining algorithm named CA which improves the original methods of CURE and C4.5. CA introduces principle component analysis (PCA), grid partition and parallel processing which can achieve feature reduction and scale reduction for large-scale datasets. This paper applies CA algorithm to maize seed breeding and the results of experiments show that our approach is better than original methods.

Keywords

data mining; decision trees; pattern clustering; principal component analysis; C4.5; CA algorithm; CURE; clustering; decision tree; feature reduction; grid partition; large-scale datasets; maize seed breeding; parallel processing; principle component analysis; scale reduction; synthesized data mining; Algorithm design and analysis; Classification algorithms; Classification tree analysis; Clustering algorithms; Ear; Partitioning algorithms; clustering; decision tree; maize seed breeding;

fLanguage

English

Publisher

ieee

Conference_Titel

Computer and Information Technology (CIT), 2010 IEEE 10th International Conference on

Conference_Location

Bradford

Print_ISBN

978-1-4244-7547-6

Type

conf

DOI

10.1109/CIT.2010.456

Filename

5578535