• DocumentCode
    2914070
  • Title

    Determine the Critical dimension in data mining (experiments with bioinformatics datasets)

  • Author

    Suryakumar, Divya ; Sung, Andrew H. ; Liu, Qingzhong

  • Author_Institution
    Dept. of Comput. Sci. & Eng., New Mexico Tech, Socorro, TX, USA
  • fYear
    2011
  • fDate
    22-24 Nov. 2011
  • Firstpage
    481
  • Lastpage
    486
  • Abstract
    The "curse of dimensionality" problem, which occurs in many applications involving data mining such as biomedical informatics, digital forensics, risk management, etc., makes it difficult to develop accurate learning machine classifiers when the dataset includes too many irrelevant or insignificant features. Therefore, finding the smallest set of features necessary to obtain the most accurate classifier is an issue of great theoretical and practical interest. In efforts toward developing formal methods for finding the "critical dimension", this paper presents an empirical study of the minimum number of features that are required for a learning machine to perform accurately. The dataset is first featured ranked; then, iteratively, the least important feature is removed and the performance is plotted against the number of features; the point at which the performance curve drops significantly and does not rise again gives the critical dimension, which is a unique number for each specific combination of learning machine and feature ranking method. It is shown in this paper that the critical dimension phenomenon indeed exists for several of the bioinformatics datasets studied.
  • Keywords
    bioinformatics; data mining; formal verification; learning (artificial intelligence); pattern classification; bioinformatics datasets; biomedical informatics; critical dimension; curse of dimensionality problem; data mining; digital forensics; feature ranking method; formal methods; learning machine classifiers; risk management; Decision support systems; Intelligent systems; Critical dimension; data mining; dimensionality reduction; feature or attribute reduction;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Intelligent Systems Design and Applications (ISDA), 2011 11th International Conference on
  • Conference_Location
    Cordoba
  • ISSN
    2164-7143
  • Print_ISBN
    978-1-4577-1676-8
  • Type

    conf

  • DOI
    10.1109/ISDA.2011.6121702
  • Filename
    6121702