• DocumentCode
    2770734
  • Title

    A Language Modeling Text Mining Approach to the Annotation of Protein Community

  • Author

    Zhang, Xiaodan ; Wu, Daniel D. ; Zhou, Xiaohua ; Hu, Xiaohua

  • Author_Institution
    Coll. of Inf. Sci. & Technol., Drexel Univ., Philadelphia, PA
  • fYear
    2006
  • fDate
    16-18 Oct. 2006
  • Firstpage
    12
  • Lastpage
    19
  • Abstract
    This paper discusses an ontology based language modeling text mining approach to the annotation of protein community. Communities appear to play an important role in the functional properties of complex networks. Being able to annotate the identified the community structure in a biological network can help us to understand better the structure and dynamics of biological systems. Traditional method such as gene ontology (GO) provides information about the functionality of gene products, but they are not enough to annotate community as for only limited number of proteins in the database, limited protein properties available for annotation and the inability to annotate a group of gene products as a whole. Thus, we present an ontology based mixture language model approach to annotate protein community. Compared to traditional method, we have the following three advantages. First, biomedical literature mining brings much richer information than existed gene databases. Second, the mixture language model can help "purify" the document by eliminating some background noise. Third, using domain ontology, we extract biological concept and concept pairs from abstracts. Biological concept is more meaningful than word or multi-word phrases. Moreover, using concept pairs can deliver much more information and serve as evidence of annotation results. We test our approach on four communities SAGA-SRB, CCR-NOT, RFC and ARP2/3, detected from dataset of interactions for Saccharomyces cerevisae from the general repository for interaction datasets (GRID). Annotation results provide a very coherent indication of functionality of each community
  • Keywords
    biological techniques; biology computing; data mining; genetics; molecular biophysics; ontologies (artificial intelligence); proteins; ARP2/3; CCR-NOT; RFC; SAGA-SRB; Saccharomyces cerevisae interactions; background noise elimination; biological network functional properties; biological system dynamics; biological system structure; biomedical literature mining; domain ontology; gene databases; gene ontology; gene products; general repository for interaction datasets; language modeling text mining approach; mixture language model; protein community annotation; protein properties; Abstracts; Background noise; Biological system modeling; Biological systems; Complex networks; Data mining; Databases; Ontologies; Proteins; Text mining;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    BioInformatics and BioEngineering, 2006. BIBE 2006. Sixth IEEE Symposium on
  • Conference_Location
    Arlington, VA
  • Print_ISBN
    0-7695-2727-2
  • Type

    conf

  • DOI
    10.1109/BIBE.2006.253310
  • Filename
    4019635