• DocumentCode
    2469734
  • Title

    Parallel and distributed kmeans to identify the translation initiation site of proteins

  • Author

    Rodrigues, Laerte M. ; Zárate, Luis E. ; Nobre, Cristiane N. ; Freitas, Henrique C.

  • Author_Institution
    Dept. of Comput. Sci., Pontificia Univ. Catolica de Minas Gerais, Belo Horizonte, Brazil
  • fYear
    2012
  • fDate
    14-17 Oct. 2012
  • Firstpage
    1639
  • Lastpage
    1645
  • Abstract
    Prediction of the translation initiation site is of vital importance in bioinformatics since through this process it is possible to understand the organic formation and metabolic behavior of living organisms. Sequential algorithms are not always a viable solution due to the fact that mRNA databases are normally very large, resulting in long processing times. Applying parallel and distributed computing resources to such databases could help reduce this time. The objective of this article is to present a class balancing solution for the translation initiation site process using parallel and distributed computing resources in a hybrid model. The results reveal a speedup of up to 23 times compared to sequential methods and performance rates for accuracy, precision, sensitivity, specificity and adjusted accuracy of 91.15%, 39.83%, 89.11%, 88.93% and 89.02%, respectively, for the Homo sapiens database. For the Drosophila melanogaster database, the speedup was 18.33 times and accuracy, precision, sensitivity, specificity and adjusted accuracy were 95.22%, 43.01%, 90.83%, 90.47% and 90.64%, respectively. Both sets of results are considered important. Thus, the solution presented in this article demonstrated itself viable for the problem in question.
  • Keywords
    bioinformatics; message passing; molecular biophysics; parallel algorithms; parallel programming; pattern clustering; proteins; very large databases; Drosophila melanogaster database; Homo sapiens database; accuracy performance; adjusted accuracy performance; bioinformatics; class balancing solution; distributed computing; distributed k-means clustering; living organism; mRNA database; parallel computing; parallel k-means clustering; precision performance; protein translation initiation site; sensitivity performance; specificity performance; translation initiation site processing; Accuracy; Clustering algorithms; Databases; Equations; Libraries; Organisms; Sensitivity; Bioinformatics; Clustering; Parallel and distributed Systems;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Systems, Man, and Cybernetics (SMC), 2012 IEEE International Conference on
  • Conference_Location
    Seoul
  • Print_ISBN
    978-1-4673-1713-9
  • Electronic_ISBN
    978-1-4673-1712-2
  • Type

    conf

  • DOI
    10.1109/ICSMC.2012.6377972
  • Filename
    6377972