DocumentCode
2379415
Title
Prediction of protein translation initiation site from the perspective of imbalanced classes
Author
de Souza Teixeira, Felipe Carvalho ; Nobre, Cristiane Neri ; Silva, Lívia Márcia ; Zárate, Luis Enrique
Author_Institution
Dept. de Cienc. da Comput., Univ. Fed. de Sao Joao del-Rei, São João del-Rei, Brazil
fYear
2011
fDate
9-12 Oct. 2011
Firstpage
1313
Lastpage
1318
Abstract
The correct prediction of protein translation initiation from messenger RNA (mRNA) is an important activity for genomic annotation. This problem is known to be highly imbalanced, since each molecule has a single mRNA translation initiation site and several others which are not initiator AUGs. In this context, the present work has focused on undersampling methods for balancing class distribution proposed to solve such problem. Here, we present the results obtained from the Condensed Nearest Neighbor Rule (CNN), Tomek, and random sampling methods already known in the literature, and introduce a new one for balancing, namely C-Blocks, which is based on clustering. Using the concept of Bagging [1], the results of these classifiers is combined to obtain a final ranking of the sequence.
Keywords
biology computing; genomics; learning (artificial intelligence); sampling methods; Tomek method; bagging concept; c-blocks method; condensed nearest neighbor rule method; genomic annotation; imbalanced classes perspective; messenger RNA; protein translation initiation; random sampling method; undersampling method; Accuracy; Bagging; Context; Encoding; Sensitivity; Support vector machines; Training; Imbalanced Classes; Protein Translation Initiation Site; Support Vector Machine;
fLanguage
English
Publisher
ieee
Conference_Titel
Systems, Man, and Cybernetics (SMC), 2011 IEEE International Conference on
Conference_Location
Anchorage, AK
ISSN
1062-922X
Print_ISBN
978-1-4577-0652-3
Type
conf
DOI
10.1109/ICSMC.2011.6083841
Filename
6083841
Link To Document