Title :
Rough Sets for Selection of Molecular Descriptors to Predict Biological Activity of Molecules
Author :
Maji, Pradipta ; Paul, Sushmita
Author_Institution :
Machine Intell. Unit, Indian Stat. Inst., Kolkata, India
Abstract :
Quantitative structure activity relationship (QSAR) is one of the important disciplines of computer-aided drug design that deals with the predictive modeling of properties of a molecule. In general, each QSAR dataset is small in size with large number of features or descriptors. Among the large amount of descriptors presented in the QSAR dataset, only a small fraction of them is effective for performing the predictive modeling task. In this paper, a new feature selection algorithm is presented, based on rough set theory, to select a set of effective molecular descriptors from a given QSAR dataset. The proposed algorithm selects the set of molecular descriptors by maximizing both relevance and significance of the descriptors. An important finding is that the proposed feature selection algorithm is shown to be effective in selecting relevant and significant molecular descriptors from the QSAR dataset for predictive modeling. The performance of the proposed algorithm is studied using R2 statistic of support vector regression method. The effectiveness of the proposed algorithm, along with a comparison with existing algorithms, is demonstrated on three QSAR datasets.
Keywords :
pharmaceutical industry; regression analysis; rough set theory; support vector machines; computer aided drug design; molecular descriptors; molecules biological activity prediction; predictive modeling; quantitative structure activity relationship; rough sets; support vector regression method; Biological system modeling; Biology computing; Chemicals; Costs; Design automation; Drugs; Predictive models; Rough sets; Set theory; Support vector machines; Drug design; feature selection; quantitative structure activity relationship (QSAR); rough set; support vector machine (SVM);
Journal_Title :
Systems, Man, and Cybernetics, Part C: Applications and Reviews, IEEE Transactions on
DOI :
10.1109/TSMCC.2010.2047943