• DocumentCode
    2007304
  • Title

    Dimension Reduction via Unsupervised Learning Yields Significant Computational Improvements for Support Vector Machine Based Protein Family Classification

  • Author

    Robertson, Bobbie Jo M Webb ; Matzke, Melissa M. ; Oehmen, Christopher S.

  • fYear
    2008
  • fDate
    11-13 Dec. 2008
  • Firstpage
    457
  • Lastpage
    462
  • Abstract
    Reducing the dimension of vectors used in training support vector machines (SVMs) results in a proportional speedup in training time. For large-scale problems this can make the difference between tractable and intractable training tasks. However, it is critical that classifiers trained on reduced datasets perform as reliably as their counterparts trained on high-dimensional data. We assessed principal component analysis (PCA) and sequential project pursuit (SPP) as dimension reduction strategies in the biology application of classifying proteins into well-defined functional dasiafamiliespsila (SVM-based protein family classification) by their impact on run-time, sensitivity and selectivity. Homology vectors of 4352 elements were reduced to approximately 2% of the original data size using PCA and SPP without significantly affecting accuracy, while leading to approximately a 28-fold speedup in run-time.
  • Keywords
    biology computing; data reduction; genetics; pattern classification; principal component analysis; proteins; support vector machines; unsupervised learning; PCA; automated genome sequencing technology; high-dimensional data training; homology vector dimension reduction; intractable training task; principal component analysis; protein family classification; sequential project pursuit; support vector machine; unsupervised learning; Bioinformatics; Floods; Genomics; Principal component analysis; Proteins; Runtime; Sequences; Support vector machine classification; Support vector machines; Unsupervised learning; dimension reduction; machine leraning; protein homology detection; support vector machine;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Machine Learning and Applications, 2008. ICMLA '08. Seventh International Conference on
  • Conference_Location
    San Diego, CA
  • Print_ISBN
    978-0-7695-3495-4
  • Type

    conf

  • DOI
    10.1109/ICMLA.2008.120
  • Filename
    4725013