• DocumentCode
    78418
  • Title

    Gene Name Disambiguation Using Multi-Scope Species Detection

  • Author

    Jui-Chen Hsiao ; Chih-Hsuan Wei ; Hung-Yu Kao

  • Author_Institution
    Inst. of Med. Inf., Nat. Cheng Kung Univ., Tainan, Taiwan
  • Volume
    11
  • Issue
    1
  • fYear
    2014
  • fDate
    Jan.-Feb. 2014
  • Firstpage
    55
  • Lastpage
    62
  • Abstract
    Species detection is an important topic in the text mining field. According to the importance of the research topics (e.g., species assignment to genes and document focus species detection), some studies are dedicated to an individual topic. However, no researcher to date has discussed species detection as a general problem. Therefore, we developed a multi-scope species detection model to identify the focus species for different scopes (i.e., gene mention, sentence, paragraph, and global scope of the entire article). Species assignment is one of the bottlenecks of gene name disambiguation. In our evaluation, recognizing the focus species of a gene mention in four different scopes improved the gene name disambiguation. We used the species cue words extracted from articles to estimate the relevance between an article and a species. The relevance score was calculated by our proposed entities frequency-augmented invert species frequency (EF-AISF) formula, which represents the importance of an entity to a species. We also defined a relation guide factor (RGF) to normalize the relevance score. Our method not only achieved better performance than previous methods but also can handle the articles that do not specifically mention a species. In the DECA corpus, we outperformed previous studies and obtained an accuracy of 88.22 percent.
  • Keywords
    data mining; genetics; text analysis; DECA corpus; EF-AISF formula; entity frequency-augmented invert species frequency formula; gene mention; gene name disambiguation; multiscope species detection; relation guide factor; text mining field; Frequency estimation; Mice; Proteins; Standards; Taxonomy; Text mining; Tin; Biomedical text mining; focus species detection; gene name disambiguation;
  • fLanguage
    English
  • Journal_Title
    Computational Biology and Bioinformatics, IEEE/ACM Transactions on
  • Publisher
    ieee
  • ISSN
    1545-5963
  • Type

    jour

  • DOI
    10.1109/TCBB.2013.139
  • Filename
    6654152