• DocumentCode
    2334079
  • Title

    Comparisons of classification methods for screening potential compounds

  • Author

    An, Aijun ; Wang, Yuanyuan

  • Author_Institution
    Dept. of Comput. Sci., York Univ., Toronto, Ont., Canada
  • fYear
    2001
  • fDate
    2001
  • Firstpage
    11
  • Lastpage
    18
  • Abstract
    We compare a number of data mining and statistical methods on the drug design problem of modeling molecular structure-activity relationships. The relationships can be used to identify active compounds based on their chemical structures from a large inventory of chemical compounds. The data set of this application has a highly skewed class distribution, in which only 2% of the compounds are considered active. We apply a number of classification methods to this extremely imbalanced data set and propose to use different performance measures to evaluate these methods. We report our findings on the characteristics of the performance measures, the effect of using pruning techniques in this application and a comparison of local learning methods with global techniques. We also investigate whether reducing the imbalance in the training data by up-sampling or down-sampling would improve the predictive performance
  • Keywords
    chemistry computing; data mining; learning (artificial intelligence); pattern classification; pharmaceutical industry; active compounds; chemical compounds; chemical structures; classification methods; data mining; data set; down-sampling; drug design problem; global techniques; highly skewed class distribution; imbalanced data set; local learning methods; molecular structure-activity relationships; performance measures; potential compound screening; predictive performance; pruning techniques; statistical methods; training data; up-sampling; Chemical compounds; Computer science; Data mining; Drugs; High temperature superconductors; Human immunodeficiency virus; Protection; Statistics; Testing; Throughput;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Mining, 2001. ICDM 2001, Proceedings IEEE International Conference on
  • Conference_Location
    San Jose, CA
  • Print_ISBN
    0-7695-1119-8
  • Type

    conf

  • DOI
    10.1109/ICDM.2001.989495
  • Filename
    989495