• DocumentCode
    3633648
  • Title

    Prediction of protein-protein interaction relevance of articles using references

  • Author

    Cagatay Calli

  • Author_Institution
    Department of Computer Engineering, Middle East Technical University, Ankara, Turkey
  • fYear
    2009
  • Firstpage
    189
  • Lastpage
    192
  • Abstract
    Classifying documents as protein-protein interaction (PPI) relevant or not is the first step towards extracting meaningful PPI data from article content. Currently, this classification step is handled manually by expert curators. A number of text-mining methods have been proposed to tackle this problem, using abstracts without references. We propose that article references contain important information that can be used to enhance these previous techniques. We trained an SVM classifier solely based on reference links extracted from Biocreative II data to test the effect of references. Our approach includes a feature selection method based on reference count imbalance between positive and negative examples. Classification results on Biocreative II test and Biocreative II.5 training datasets show that even simple referential information extracted from papers can be effective for predicting protein interaction.
  • Keywords
    "Data mining","Databases","Testing","Protein engineering","Abstracts","Natural language processing","Machine learning","Tagging","Training data","Support vector machines"
  • Publisher
    ieee
  • Conference_Titel
    Computer and Information Sciences, 2009. ISCIS 2009. 24th International Symposium on
  • Print_ISBN
    978-1-4244-5021-3
  • Type

    conf

  • DOI
    10.1109/ISCIS.2009.5291842
  • Filename
    5291842