Title :
Prediction of protein translation initiation site from the perspective of imbalanced classes
Author :
de Souza Teixeira, Felipe Carvalho ; Nobre, Cristiane Neri ; Silva, Lívia Márcia ; Zárate, Luis Enrique
Author_Institution :
Dept. de Cienc. da Comput., Univ. Fed. de Sao Joao del-Rei, São João del-Rei, Brazil
Abstract :
The correct prediction of protein translation initiation from messenger RNA (mRNA) is an important activity for genomic annotation. This problem is known to be highly imbalanced, since each molecule has a single mRNA translation initiation site and several others which are not initiator AUGs. In this context, the present work has focused on undersampling methods for balancing class distribution proposed to solve such problem. Here, we present the results obtained from the Condensed Nearest Neighbor Rule (CNN), Tomek, and random sampling methods already known in the literature, and introduce a new one for balancing, namely C-Blocks, which is based on clustering. Using the concept of Bagging [1], the results of these classifiers is combined to obtain a final ranking of the sequence.
Keywords :
biology computing; genomics; learning (artificial intelligence); sampling methods; Tomek method; bagging concept; c-blocks method; condensed nearest neighbor rule method; genomic annotation; imbalanced classes perspective; messenger RNA; protein translation initiation; random sampling method; undersampling method; Accuracy; Bagging; Context; Encoding; Sensitivity; Support vector machines; Training; Imbalanced Classes; Protein Translation Initiation Site; Support Vector Machine;
Conference_Titel :
Systems, Man, and Cybernetics (SMC), 2011 IEEE International Conference on
Conference_Location :
Anchorage, AK
Print_ISBN :
978-1-4577-0652-3
DOI :
10.1109/ICSMC.2011.6083841