• DocumentCode
    2468144
  • Title

    Prediction of protein subcellular localization based on variable-length motifs detection and dissimilarity based classification

  • Author

    Arango-Argoty, G.A. ; Jaramillo-Garzón, J.A. ; Röthlisberger, S. ; Castellanos-Dominguez, C.G.

  • Author_Institution
    Signal Processing and Recognition Group, Universidad Nacional de Colombia, s. Manizales, Campus La Nubia, km 7 vía al Magdalena, Colombia
  • fYear
    2011
  • fDate
    Aug. 30 2011-Sept. 3 2011
  • Firstpage
    945
  • Lastpage
    948
  • Abstract
    Predict the function of unknown proteins is one of the principal goals in computational biology. The subcellular localization of a protein allows further understanding its structure and molecular function. Numerous prediction techniques have been developed, usually focusing on global information of the protein. But, predictions can be done through the identification of functional sub-sequence patterns known as motifs. For motifs discovery problem, many methods requires a predefined fixed window size in advance and aligned sequences. To confront these problems we proposed a method based on variable length motifs characterization and detection using the continuous wavelet transform (CWT) and a dissimilarity space representation. For analyzing the motifs results generated by our approach, we divide the entire dataset into training (60%) and validation (40%). A Support Vector Machine (SVM) classifier is used as predictor for validation set. The highest Sn = 82.58% and Sp = 92.86%, across 10-fold cross validation, is obtained for endosome proteins. Average results Sn = 74% and Sp = 75.58% are comparable to current state of the art. For data sets whose identity is low (< 40%), the motifs characterization and localization based on CWT shows a good performance and the interpretability of the subsequences in each subcellular localization.
  • Keywords
    Amino acids; Continuous wavelet transforms; Proteins; Prototypes; Support vector machines; Motifs; hydrophathy scale; subcellular localization; support vector machine; wavelet transform; Algorithms; Amino Acid Sequence; Gene Expression Profiling; Molecular Sequence Data; Pattern Recognition, Automated; Proteins; Sequence Analysis, Protein; Software; Structure-Activity Relationship; Subcellular Fractions; Support Vector Machines;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Engineering in Medicine and Biology Society, EMBC, 2011 Annual International Conference of the IEEE
  • Conference_Location
    Boston, MA
  • ISSN
    1557-170X
  • Print_ISBN
    978-1-4244-4121-1
  • Electronic_ISBN
    1557-170X
  • Type

    conf

  • DOI
    10.1109/IEMBS.2011.6090213
  • Filename
    6090213