• DocumentCode
    3243103
  • Title

    TMAC: An automated text mining tool for construction of an annotated corpus to support protein-protein interaction information extraction

  • Author

    Azzem, Rania Ahmed Abdel ; Seoud, Abul

  • Author_Institution
    Dept. of Electr. Eng., El Fayoum Univ., Fayoum, Egypt
  • fYear
    2010
  • fDate
    2-4 Nov. 2010
  • Firstpage
    75
  • Lastpage
    79
  • Abstract
    Extracting protein-protein interaction (PPI) from biomedical literatures is a meaningful topic in protein science. Annotated corpora are important to the development and evaluation of protein-protein interaction extraction systems. So it is important to construct a text mining tool for the annotation of any corpus for protein name and interaction events for the identification of interactions among proteins. In this paper we present a java package called the TMAC system. TMAC tagged protein names and interaction events in biomedical literatures based on a combination of carefully designed rules and a dictionary of protein names. TMAC is able to normalize the results of protein mentions and interaction events found by offering the appropriate database reference. TMAC is divided into two modules. The first module is the Name entity identification and normalization module. The second module is the interaction event tagger for the identification of words that will ensure the occurrence of the interaction. TMAC achieved an average of 85.2% precision, 76.7% recall for the protein identification process. TMAC achieved an average of 88.2% precision, 71.8% recall for the protein - protein interaction event identification process. TMAC is a flexible system. It could be used as a standalone application or can be incorporated in the workflow of a more general text mining system.
  • Keywords
    biology computing; data mining; proteins; text analysis; Java package; TMAC system; annotated corpora; annotated corpus; automated text mining; biomedical literatures; protein identification; protein science; protein-protein interaction extraction systems; protein-protein interaction information extraction; text mining system; Abstracts; Databases; Dictionaries; Protein engineering; Proteins; Text mining; named entity recognition; protein normalization; text-mining;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer Technology and Development (ICCTD), 2010 2nd International Conference on
  • Conference_Location
    Cairo
  • Print_ISBN
    978-1-4244-8844-5
  • Electronic_ISBN
    978-1-4244-8845-2
  • Type

    conf

  • DOI
    10.1109/ICCTD.2010.5646069
  • Filename
    5646069