• DocumentCode
    2379415
  • Title

    Prediction of protein translation initiation site from the perspective of imbalanced classes

  • Author

    de Souza Teixeira, Felipe Carvalho ; Nobre, Cristiane Neri ; Silva, Lívia Márcia ; Zárate, Luis Enrique

  • Author_Institution
    Dept. de Cienc. da Comput., Univ. Fed. de Sao Joao del-Rei, São João del-Rei, Brazil
  • fYear
    2011
  • fDate
    9-12 Oct. 2011
  • Firstpage
    1313
  • Lastpage
    1318
  • Abstract
    The correct prediction of protein translation initiation from messenger RNA (mRNA) is an important activity for genomic annotation. This problem is known to be highly imbalanced, since each molecule has a single mRNA translation initiation site and several others which are not initiator AUGs. In this context, the present work has focused on undersampling methods for balancing class distribution proposed to solve such problem. Here, we present the results obtained from the Condensed Nearest Neighbor Rule (CNN), Tomek, and random sampling methods already known in the literature, and introduce a new one for balancing, namely C-Blocks, which is based on clustering. Using the concept of Bagging [1], the results of these classifiers is combined to obtain a final ranking of the sequence.
  • Keywords
    biology computing; genomics; learning (artificial intelligence); sampling methods; Tomek method; bagging concept; c-blocks method; condensed nearest neighbor rule method; genomic annotation; imbalanced classes perspective; messenger RNA; protein translation initiation; random sampling method; undersampling method; Accuracy; Bagging; Context; Encoding; Sensitivity; Support vector machines; Training; Imbalanced Classes; Protein Translation Initiation Site; Support Vector Machine;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Systems, Man, and Cybernetics (SMC), 2011 IEEE International Conference on
  • Conference_Location
    Anchorage, AK
  • ISSN
    1062-922X
  • Print_ISBN
    978-1-4577-0652-3
  • Type

    conf

  • DOI
    10.1109/ICSMC.2011.6083841
  • Filename
    6083841