• DocumentCode
    2039635
  • Title

    Information theoretic feature selection for high dimensional metagenomic data

  • Author

    Ditzler, Gregory ; Rosen, Gail ; Polikar, Robi

  • Author_Institution
    Dept. of Electr. & Comput. Eng., Drexel Univ., Philadelphia, PA, USA
  • fYear
    2012
  • fDate
    2-4 Dec. 2012
  • Firstpage
    143
  • Lastpage
    146
  • Abstract
    Extremely high dimensional data sets are common in genomic classification scenarios, but they are particularly prevalent in metagenomic studies that represent samples as abundances of taxonomic units. Furthermore, the data dimensionality is typically much larger than the number of observations collected for each instance, a phenomenon known as curse of dimensionality, a particularly challenging problem for most machine learning algorithms. The biologists collecting and analyzing data need efficient methods to determine relationships between classes in a data set and the variables that are capable of differentiating between multiple groups in a study. The most common methods of metagenomic data analysis are those characterized by α- and β-diversity tests; however, neither of these tests allow scientists to identify the organisms that are most responsible for differentiating between different categories in a study. In this paper, we present an analysis of information theoretic feature selection methods for improving the classification accuracy with metagenomic data.
  • Keywords
    RNA; biology computing; feature extraction; genomics; information theory; learning (artificial intelligence); meta data; molecular biophysics; pattern classification; α-diversity test; β-diversity test; classification accuracy; dimensionality curse; genomic classification scenarios; high dimensional metagenomic data; information theoretic feature selection; machine learning algorithms; metagenomic data analysis; organism identification; taxonomic units;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Genomic Signal Processing and Statistics, (GENSIPS), 2012 IEEE International Workshop on
  • Conference_Location
    Washington, DC
  • ISSN
    2150-3001
  • Print_ISBN
    978-1-4673-5234-5
  • Type

    conf

  • DOI
    10.1109/GENSIPS.2012.6507749
  • Filename
    6507749