• DocumentCode
    1611935
  • Title

    Miner for OACCR: Case of medical data analysis in knowledge discovery

  • Author

    Ali, Sufian H.

  • Author_Institution
    Dept. of Software, Univ. of Babylon, Hilla, Iraq
  • fYear
    2012
  • Firstpage
    962
  • Lastpage
    975
  • Abstract
    Modern scientific data consist of huge datasets which gathered by a very large number of techniques and stored in much diversified and often incompatible data repositories as data of bioinformatics, geoinformatics, astroinformatics and Scientific World Wide Web. At the other hand, lack of reference data is very often responsible for poor performance of learning where one of the key problems in supervised learning is due to the insufficient size of the training dataset. Therefore, we try to suggest a new development a theoretically and practically valid tool for analyzing small of sample data remains a critical and challenging issue for researches. This paper presents a methodology for Obtaining Accurate and Comprehensible Classification Rules (OACCR) of both small and huge datasets with the use of hybrid techniques represented by knowledge discovering. In this article the searching capability of a Genetic Programming Data Construction Method (GPDCM) has been exploited for automatically creating more visual samples from the original small dataset. Add to that, this paper attempts to developing Random Forest data mining algorithm to handle missing value problem. Then database which describes depending on their components were built by Principle Component Analysis (PCA), after that, association rule algorithm to the FP-Growth algorithm (FP-Tree) was used. At the last, TreeNet classifier determines the class under which each association rules belongs to was used. The proposed methodology provides fast, Accurate and comprehensible classification rules. Also, this methodology can be use to compression dataset in two dimensions (number of features, number of records).
  • Keywords
    data mining; genetic algorithms; medical administrative data processing; OACCR; TreeNet classifier; astroinformatics; bioinformatics; data mining algorithm; datasets; genetic programming data construction method; geoinformatics; hybrid techniques; knowledge discovery; medical data analysis; obtaining accurate and comprehensible classification rules; principle component analysis; scientific World Wide Web; Algorithm design and analysis; Classification algorithms; Clustering algorithms; Data mining; Databases; Training; Vegetation; Adboosting; FP-Growth; GPDCM; PCA; Random Forest;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Sciences of Electronics, Technologies of Information and Telecommunications (SETIT), 2012 6th International Conference on
  • Conference_Location
    Sousse
  • Print_ISBN
    978-1-4673-1657-6
  • Type

    conf

  • DOI
    10.1109/SETIT.2012.6482043
  • Filename
    6482043