Title :
TVscreen: Trend Vector Virtual SCREENing of Large Commercial Compounds Collections
Author :
Plewczynski, Dariusz
Author_Institution :
Interdiscipl. Centre for Math. & Comput. Modeling, Univ. of Warsaw, Warsaw
fDate :
June 29 2008-July 5 2008
Abstract :
We present here the trend vector based method for identification of inhibitors for a given protein target. Therefore our approach reduces the number of compounds to be tested experimentally in costly validation studies, when some initial information about actives is already available. The machine learning method is trained here on compounds from the Elsevier Molecular Design Ltd (MDL) Information Systems´ drug data report (MDDR) for five diverse protein targets (cyclooxygenase-2, dihydrofolatereductase, thrombin, HIV-reverse transcriptase and antagonists of the estrogen receptor). Each classified ligand is represented using an optimized set of two dimensional topological descriptors. Then the trend vectors are used to divide the whole set of ligands into two groups: 1) molecules predicted to be active, and 2) those predicted to be inactive. Training and predicted activities were treated as binary. The accuracy of the method is comparable to other existing prediction tools (such as support vector machines, or random forest), whereas it provides significantly higher speed and portability. The accuracy of prediction (precision) reaches 60% on heterogeneous source data. As a consequence, the method can be easily applied to large commercial compounds collections.
Keywords :
biology computing; enzymes; learning (artificial intelligence); molecular biophysics; HIV-reverse transcriptase; TVscreen; cyclooxygenase-2; dihydrofolatereductase; drug data report; estrogen receptor; large commercial compounds collections; ligand; machine learning; protein; thrombin; trend vector virtual screening; Bioinformatics; Databases; Drugs; Inhibitors; Learning systems; Machine learning algorithms; Proteins; Support vector machine classification; Support vector machines; Testing; Chemical Descriptors; Compound Identification; MDL Drug Data Report; Machine-learning methods; Trend Vectors; Virtual High-Throughput Screening;
Conference_Titel :
Biocomputation, Bioinformatics, and Biomedical Technologies, 2008. BIOTECHNO '08. International Conference on
Conference_Location :
Bucharest
Print_ISBN :
978-0-7695-3191-5
Electronic_ISBN :
978-0-7695-3191-5
DOI :
10.1109/BIOTECHNO.2008.15