DocumentCode :
3256166
Title :
Discriminative Optimization of String Similarity and Its Application to Biomedical Abbreviation Clustering
Author :
Yamaguchi, Atsuko ; Yamamoto, Yasunori ; Kim, Jin-Dong ; Takagi, Toshihisa ; Yonezawa, Akinori
Author_Institution :
Database Center for Life Sci., Tokyo, Japan
Volume :
2
fYear :
2011
fDate :
18-21 Dec. 2011
Firstpage :
72
Lastpage :
77
Abstract :
Many string similarity measures have been developed to deal with the variety of expressions in natural language texts. With the abundance of such measures, we should consider the choice of measures and its parameters to maximize the performance for a given task. During our preliminary experiment to find the best measure and its parameters for the task of clustering terms to improve our abbreviation dictionary in life science, we found that chemical names had different characteristics in their character sequences compared to other terms. Based on the observation, we experimented with four string similarity measures to test the hypothesis, "chemical names has a different morphology, thus computation of their similarity should be differed from that of other terms." The experimental results show that the edit distance is the best for chemical names, and that the discriminative application of string similarity methods to chemical and non-chemical names may be a simple but effective way to improve the performance of term clustering.
Keywords :
bioinformatics; natural language processing; optimisation; pattern clustering; text analysis; abbreviation dictionary; biomedical abbreviation clustering; chemical names; discriminative optimization; edit distance; life science; natural language texts; nonchemical names; string similarity measures; term clustering; Benchmark testing; Biomedical measurements; Chemicals; Databases; Length measurement; Manuals; Unified modeling language; String Similarity Measure; Term Clustering;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Machine Learning and Applications and Workshops (ICMLA), 2011 10th International Conference on
Conference_Location :
Honolulu, HI
Print_ISBN :
978-1-4577-2134-2
Type :
conf
DOI :
10.1109/ICMLA.2011.58
Filename :
6147051
Link To Document :
بازگشت