• DocumentCode
    3256166
  • Title

    Discriminative Optimization of String Similarity and Its Application to Biomedical Abbreviation Clustering

  • Author

    Yamaguchi, Atsuko ; Yamamoto, Yasunori ; Kim, Jin-Dong ; Takagi, Toshihisa ; Yonezawa, Akinori

  • Author_Institution
    Database Center for Life Sci., Tokyo, Japan
  • Volume
    2
  • fYear
    2011
  • fDate
    18-21 Dec. 2011
  • Firstpage
    72
  • Lastpage
    77
  • Abstract
    Many string similarity measures have been developed to deal with the variety of expressions in natural language texts. With the abundance of such measures, we should consider the choice of measures and its parameters to maximize the performance for a given task. During our preliminary experiment to find the best measure and its parameters for the task of clustering terms to improve our abbreviation dictionary in life science, we found that chemical names had different characteristics in their character sequences compared to other terms. Based on the observation, we experimented with four string similarity measures to test the hypothesis, "chemical names has a different morphology, thus computation of their similarity should be differed from that of other terms." The experimental results show that the edit distance is the best for chemical names, and that the discriminative application of string similarity methods to chemical and non-chemical names may be a simple but effective way to improve the performance of term clustering.
  • Keywords
    bioinformatics; natural language processing; optimisation; pattern clustering; text analysis; abbreviation dictionary; biomedical abbreviation clustering; chemical names; discriminative optimization; edit distance; life science; natural language texts; nonchemical names; string similarity measures; term clustering; Benchmark testing; Biomedical measurements; Chemicals; Databases; Length measurement; Manuals; Unified modeling language; String Similarity Measure; Term Clustering;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Machine Learning and Applications and Workshops (ICMLA), 2011 10th International Conference on
  • Conference_Location
    Honolulu, HI
  • Print_ISBN
    978-1-4577-2134-2
  • Type

    conf

  • DOI
    10.1109/ICMLA.2011.58
  • Filename
    6147051