Title :
Strip mining for molecules
Author :
Embrechts, Mark J. ; Arciniegas, F. ; Ozdemir, Muhsin ; Momma, M. ; Breneman, Curt M. ; Lockwood, L. ; Bennett, K.P. ; Kewley, R.H.
Author_Institution :
Dept. of Decision Sci. & Eng. Syst., Rensselaer Polytech. Inst., Troy, NY, USA
fDate :
6/24/1905 12:00:00 AM
Abstract :
Quantitative structure-activity relationship (QSAR) problems deal with "in-silico" chemical design for the virtual invention of novel pharmaceuticals. The goal of QSAR is to predict the bioactivities of molecules based on a set of descriptive features. QSAR problems are notoriously challenging for machine learning because a typical QSAR predictive data mining problem set is characterized by a large number of descriptive features (300-1000), often for a relatively small number of molecules (50-300). This paper introduces data strip mining for QSAR modeling. Strip mining is a general approach for feature selection and predictive modeling based on successive stages of feature elimination done by performing a sensitivity analysis to a predictive model
Keywords :
chemical engineering computing; data mining; molecular biophysics; pharmaceutical industry; sensitivity analysis; QSAR; biomolecular activities; chemical design; data strip mining; feature selection; pharmaceuticals; predictive data mining; predictive model; quantitative structure activity relationship; sensitivity analysis; Chemicals; Data mining; Design engineering; Drugs; Human immunodeficiency virus; Pharmaceuticals; Predictive models; Principal component analysis; Sensitivity analysis; Strips;
Conference_Titel :
Neural Networks, 2002. IJCNN '02. Proceedings of the 2002 International Joint Conference on
Conference_Location :
Honolulu, HI
Print_ISBN :
0-7803-7278-6
DOI :
10.1109/IJCNN.2002.1005488