• DocumentCode
    74167
  • Title

    Discovering Variable-Length Patterns in Protein Sequences for Protein-Protein Interaction Prediction

  • Author

    Lun Hu ; Chan, Keith C. C.

  • Author_Institution
    Hong Kong Polytech. Univ., Hong Kong, China
  • Volume
    14
  • Issue
    4
  • fYear
    2015
  • fDate
    Jun-15
  • Firstpage
    409
  • Lastpage
    416
  • Abstract
    To predict Protein-Protein Interactions (PPIs), there have recently been some attempts to use computational approaches and among them, sequence-based approaches are often preferred over other kinds of approaches as they do not require prior knowledge about proteins to perform their tasks. However, in deciding if two proteins may interact with each other, existing sequence-based approaches consider only fixed-length segments. We believe that if segments of variable-length can also be considered, interactions between proteins can be more accurately predicted. To consider variable-length segments for PPI predictions, we have developed a VLASPD algorithm. Given a database of protein sequences, VLASPD performs its tasks in several steps. The protein database is first searched to identify frequent sequence segments (FSSs) of different length. The different combinations of the presence and absence of these FSSs are then used to form different associative sequential patterns (ASPs). Based on a statistical measure, the ASPs that occur significantly frequently among proteins in the training set are then identified as significant associative sequential patterns (SASPs). If an SASP is found in a protein pair, it can be considered as providing some evidence to support or refute the existence of an interaction relationship between the protein pairs. The amount of evidence provided are then quantified with an information theoretic measure. How likely two proteins may interact with each other are then decided by the total amount of evidence provided by the SASPs found in the protein pairs. To test the effectiveness of VLASPD, we used several sets of real data. The experimental results show that VLASPD can be a promising approach for PPI prediction. The VLASPD is made available for use and testing at http://www.comp.polyu.edu.hk/~cslhu/resources/vlaspd/.
  • Keywords
    bioinformatics; proteomics; statistical analysis; VLASPD algorithm; protein sequence database; protein-protein interaction prediction; statistical measure; variable-length patterns; Databases; Kernel; Manganese; Nanobioscience; Prediction algorithms; Protein sequence; Protein-protein interaction; prediction; sequence information; variable-length pattern;
  • fLanguage
    English
  • Journal_Title
    NanoBioscience, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1536-1241
  • Type

    jour

  • DOI
    10.1109/TNB.2015.2429672
  • Filename
    7111341