DocumentCode :
637311
Title :
An ensemble method using small training sets for imbalanced data sets: Application to drugs used for kinases
Author :
Rani, T. Sobha ; Soujanya, P.V.
Author_Institution :
Comput. Intell. Lab., Univ. of Hyderabad, Hyderabad, India
fYear :
2013
fDate :
8-10 Aug. 2013
Firstpage :
516
Lastpage :
521
Abstract :
Nearly all aspects of cell life and death are controlled by the phosphorylation of proteins which are catalyzed by kinases. Malfunctioning of kinases results in cell disorders causing cancers and other diseases. The present study deals with the identification of predominant features present in the inhibitors targeting these enzymes and classification of the kinase and non-kinase inhibitors using machine learning algorithms. The present work deals with two challenges. The first challenge is the classification of unbalanced data sets. Unbalanced data sets are the data sets in which there is an imbalance in the size of data sets that constitute these sets. The second challenge is the concept complexity (closely related minority and majority data sets in the feature space). Our approach deals with the binary classification of approved human inhibitors present in the Drug bank database into kinase and non-kinase inhibitors. Clustering of the inhibitors followed by classification using an ensemble consisting of several classification models is generated. Classification is done in two levels. Weighted voting is used after each level. Finally an overall accuracy of 80% is obtained after two levels of classification. Thus we established a new a type of approach for the classification of unbalanced data sets and the data sets in which there is an overlap between instances belonging to dierent classes. Finally we established a signature specific to kinase inhibitors.
Keywords :
biochemistry; biology computing; cancer; catalysis; cellular biophysics; drugs; enzymes; learning (artificial intelligence); pattern classification; binary classification; cancers; catalysis; cell disorders; cell life; classification models; diseases; drug bank database; drugs; ensemble method; enzymes; human inhibitors; imbalanced data sets; kinase classification; kinase malfunction; machine learning algorithms; nonkinase inhibitors; phosphorylation; proteins; unbalanced data sets; weighted voting; Accuracy; Data models; Databases; Drugs; Inhibitors; Proteins; Training;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Contemporary Computing (IC3), 2013 Sixth International Conference on
Conference_Location :
Noida
Print_ISBN :
978-1-4799-0190-6
Type :
conf
DOI :
10.1109/IC3.2013.6612250
Filename :
6612250
Link To Document :
بازگشت