DocumentCode :
2456440
Title :
Query Expansion for UMLS Metathesaurus Disambiguation Based on Automatic Corpus Extraction
Author :
Jimeno-Yepes, Antonio ; Aronson, Alan R.
Author_Institution :
Nat. Libr. of Med., Bethesda, MD, USA
fYear :
2010
fDate :
12-14 Dec. 2010
Firstpage :
965
Lastpage :
968
Abstract :
Word sense disambiguation (WSD) is an intermediate task within information retrieval and information extraction, which attempts selecting the proper sense of ambiguous terms. In the biomedical domain, general WSD has not received much attention compared to the disambiguation of specific categories of entities like proteins and genes or diseases. Statistical learning approaches have achieved better performance compared to other methods. On the other hand, manually annotated data is limited, and covering all the ambiguous cases of a large resource like the UMLS is infeasible. Knowledge-based approaches using the UMLS and MEDLINE citations have achieved good performance but below that of statistical learning approaches. Our best knowledge-based result has been obtained by training a Naïve Bayes algorithm on an automatically extracted MEDLINE corpus. In this work, we extend on previous methods to enhance the quality of an automatically extracted corpus using related terms obtained from MEDLINE without manually annotated training data. We have focused on the extraction of collocations which might be used in combination with one of the senses of the ambiguous terms. We find that left side collocations have the largest improvement in accuracy with an improvement of 4%. In addition, the combination of different types of collocations and post-filtering of retrieved citations achieves an improvement of almost 9% in accuracy.
Keywords :
learning (artificial intelligence); query processing; text analysis; thesauri; UMLS metathesaurus disambiguation; automatic corpus extraction; information extraction; information retrieval; knowledge-based approaches; naïve Bayes algorithm; query expansion; statistical learning; word sense disambiguation; Accuracy; Data mining; Feature extraction; Information retrieval; Knowledge based systems; Semantics; Unified modeling language; Collocation extraction; Combination of approaches; Text categorization; UMLS; Word Sense Disambiguation;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Machine Learning and Applications (ICMLA), 2010 Ninth International Conference on
Conference_Location :
Washington, DC
Print_ISBN :
978-1-4244-9211-4
Type :
conf
DOI :
10.1109/ICMLA.2010.154
Filename :
5708977
Link To Document :
بازگشت