• DocumentCode
    583267
  • Title

    Incorporating semantic similarity into clustering process for identifying protein complexes from Affinity Purification/Mass Spectrometry data

  • Author

    Cai, Bingjing ; Wang, Haiying ; Zheng, Huiru ; Wang, Hui

  • Author_Institution
    Sch. of Comput. & Math., Univ. of Ulster, Newtownabbey, UK
  • fYear
    2012
  • fDate
    4-7 Oct. 2012
  • Firstpage
    1
  • Lastpage
    4
  • Abstract
    This paper presents a framework for incorporating semantic similarities in the detection of protein complexes from Affinity Purification/Mass Spectrometry (AP-MS) data. AP-MS data is modeled as a bipartite network, where one set of nodes consist of bait proteins and the other set are prey proteins. Pair-wise similarities of bait proteins are computed by combining similarities based on topological features and functional semantic similarities. A hierarchical clustering algorithm is then applied to obtain `seed clusters´ consisting of bait proteins. Starting from these `seed´ clusters, an expansion process is developed to recruit prey proteins which are significantly associated with bait proteins, to produce final sets of identified protein complexes. In the application to real AP-MS datasets, we validate biological significance of predicted protein complexes by using curated protein complexes. Six statistical metrics have been applied. Results show that by integrating semantic similarities into the clustering process, the accuracy of identifying complexes has been greatly improved. Meanwhile, clustering results obtained by the proposed framework are better than those from several existent clustering methods.
  • Keywords
    association; biochemistry; bioinformatics; mass spectroscopy; molecular biophysics; proteins; statistical analysis; AP-MS datasets; affinity purification-mass spectrometry data; association; bait proteins; biological significance; bipartite network; clustering process; curated protein complexes; functional semantic similarities; hierarchical clustering algorithm; pair-wise similarities; predicted protein complexes; prey proteins; protein complex detection; seed clusters; statistical metrics; topological features; Accuracy; Clustering algorithms; Codecs; Educational institutions; Protein engineering; Proteins; Semantics; Affinity purification/mass spectrometry (AP-MS); Gene Ontology; Protein compelxes; Protein-protein interactions; Semantic Similarity;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Bioinformatics and Biomedicine (BIBM), 2012 IEEE International Conference on
  • Conference_Location
    Philadelphia, PA
  • Print_ISBN
    978-1-4673-2559-2
  • Electronic_ISBN
    978-1-4673-2558-5
  • Type

    conf

  • DOI
    10.1109/BIBM.2012.6392718
  • Filename
    6392718