DocumentCode
2334079
Title
Comparisons of classification methods for screening potential compounds
Author
An, Aijun ; Wang, Yuanyuan
Author_Institution
Dept. of Comput. Sci., York Univ., Toronto, Ont., Canada
fYear
2001
fDate
2001
Firstpage
11
Lastpage
18
Abstract
We compare a number of data mining and statistical methods on the drug design problem of modeling molecular structure-activity relationships. The relationships can be used to identify active compounds based on their chemical structures from a large inventory of chemical compounds. The data set of this application has a highly skewed class distribution, in which only 2% of the compounds are considered active. We apply a number of classification methods to this extremely imbalanced data set and propose to use different performance measures to evaluate these methods. We report our findings on the characteristics of the performance measures, the effect of using pruning techniques in this application and a comparison of local learning methods with global techniques. We also investigate whether reducing the imbalance in the training data by up-sampling or down-sampling would improve the predictive performance
Keywords
chemistry computing; data mining; learning (artificial intelligence); pattern classification; pharmaceutical industry; active compounds; chemical compounds; chemical structures; classification methods; data mining; data set; down-sampling; drug design problem; global techniques; highly skewed class distribution; imbalanced data set; local learning methods; molecular structure-activity relationships; performance measures; potential compound screening; predictive performance; pruning techniques; statistical methods; training data; up-sampling; Chemical compounds; Computer science; Data mining; Drugs; High temperature superconductors; Human immunodeficiency virus; Protection; Statistics; Testing; Throughput;
fLanguage
English
Publisher
ieee
Conference_Titel
Data Mining, 2001. ICDM 2001, Proceedings IEEE International Conference on
Conference_Location
San Jose, CA
Print_ISBN
0-7695-1119-8
Type
conf
DOI
10.1109/ICDM.2001.989495
Filename
989495
Link To Document