DocumentCode :
2379415
Title :
Prediction of protein translation initiation site from the perspective of imbalanced classes
Author :
de Souza Teixeira, Felipe Carvalho ; Nobre, Cristiane Neri ; Silva, Lívia Márcia ; Zárate, Luis Enrique
Author_Institution :
Dept. de Cienc. da Comput., Univ. Fed. de Sao Joao del-Rei, São João del-Rei, Brazil
fYear :
2011
fDate :
9-12 Oct. 2011
Firstpage :
1313
Lastpage :
1318
Abstract :
The correct prediction of protein translation initiation from messenger RNA (mRNA) is an important activity for genomic annotation. This problem is known to be highly imbalanced, since each molecule has a single mRNA translation initiation site and several others which are not initiator AUGs. In this context, the present work has focused on undersampling methods for balancing class distribution proposed to solve such problem. Here, we present the results obtained from the Condensed Nearest Neighbor Rule (CNN), Tomek, and random sampling methods already known in the literature, and introduce a new one for balancing, namely C-Blocks, which is based on clustering. Using the concept of Bagging [1], the results of these classifiers is combined to obtain a final ranking of the sequence.
Keywords :
biology computing; genomics; learning (artificial intelligence); sampling methods; Tomek method; bagging concept; c-blocks method; condensed nearest neighbor rule method; genomic annotation; imbalanced classes perspective; messenger RNA; protein translation initiation; random sampling method; undersampling method; Accuracy; Bagging; Context; Encoding; Sensitivity; Support vector machines; Training; Imbalanced Classes; Protein Translation Initiation Site; Support Vector Machine;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Systems, Man, and Cybernetics (SMC), 2011 IEEE International Conference on
Conference_Location :
Anchorage, AK
ISSN :
1062-922X
Print_ISBN :
978-1-4577-0652-3
Type :
conf
DOI :
10.1109/ICSMC.2011.6083841
Filename :
6083841
Link To Document :
بازگشت