• DocumentCode
    2206069
  • Title

    A Synthesized Data Mining Algorithm Based on Clustering and Decision Tree

  • Author

    Dan, Ji ; Jianlin, Qiu ; Xiang, Gu ; Li, Chen ; Peng, He

  • Author_Institution
    Sch. of Comput. Sci. & Technol., Nantong Univ., Nantong, China
  • fYear
    2010
  • fDate
    June 29 2010-July 1 2010
  • Firstpage
    2722
  • Lastpage
    2728
  • Abstract
    With the development of information technology and computer science, high-capacity data appear in our lives. In order to help people analyzing and digging out useful information, the generation and application of data mining technology seem so significance. Clustering and decision tree are the mostly used methods of data mining. Clustering can be used for describing and decision tree can be applied to analyzing. After combining these two methods effectively, we can reflect data characters and potential rules syllabify. This paper presents a new synthesized data mining algorithm named CA which improves the original methods of CURE and C4.5. CA introduces principle component analysis (PCA), grid partition and parallel processing which can achieve feature reduction and scale reduction for large-scale datasets. This paper applies CA algorithm to maize seed breeding and the results of experiments show that our approach is better than original methods.
  • Keywords
    data mining; decision trees; pattern clustering; principal component analysis; C4.5; CA algorithm; CURE; clustering; decision tree; feature reduction; grid partition; large-scale datasets; maize seed breeding; parallel processing; principle component analysis; scale reduction; synthesized data mining; Algorithm design and analysis; Classification algorithms; Classification tree analysis; Clustering algorithms; Ear; Partitioning algorithms; clustering; decision tree; maize seed breeding;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer and Information Technology (CIT), 2010 IEEE 10th International Conference on
  • Conference_Location
    Bradford
  • Print_ISBN
    978-1-4244-7547-6
  • Type

    conf

  • DOI
    10.1109/CIT.2010.456
  • Filename
    5578535