DocumentCode :
1791707
Title :
Predicting a biological response of molecules from their chemical properties using diverse and optimized ensembles of stochastic gradient boosting machine
Author :
Abdunabi, Tarek ; Basir, Otman
Author_Institution :
Univ. of Waterloo, Waterloo, ON, Canada
fYear :
2014
fDate :
27-30 Oct. 2014
Firstpage :
10
Lastpage :
17
Abstract :
The development of a new drug largely depends on trial and error. It typically involves synthesizing thousands of compounds that finally becomes a drug. This process is extremely expensive and slow. Therefore, the ability to accurately predict the biological activity of molecules, and understand the rationale behind those predictions would be of great value to the pharmaceutical industry. Gradient Boosting Machines (GBMs) are powerful ensemble learning techniques that have been successfully applied to several low-dimensional applications. Despite their high accuracy, GBMs suffer from major drawbacks such as high memory-consumption. In this paper, using real, high-dimensional (i.e. 1776 predictors) molecules dataset, we demonstrate that by using different feature selection/reduction techniques, the computations costs for building and tuning GBMs can be substantially reduced at a slight drop in prediction accuracy. In addition, by fusing the decisions made by the ensembles using two fusion techniques, namely a majority vote and an optimized feedforward neural network, we obtain a better prediction accuracy than the individual accuracy of all ensembles.
Keywords :
feature selection; feedforward neural nets; learning (artificial intelligence); pharmaceutical industry; GBMs; biological activity prediction; biological response prediction; chemical properties; compound synthesis; drug development; ensemble learning techniques; feature reduction; feature selection; feedforward neural network; fusion techniques; gradient boosting machines; majority vote; molecules; pharmaceutical industry; stochastic gradient boosting machine; Accuracy; Boosting; Brain modeling; Buildings; Predictive models; Principal component analysis; Tuning;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Big Data (Big Data), 2014 IEEE International Conference on
Conference_Location :
Washington, DC
Type :
conf
DOI :
10.1109/BigData.2014.7004386
Filename :
7004386
Link To Document :
بازگشت