DocumentCode
3334752
Title
Word sense disambiguation in information retrieval using query expansion
Author
Paskalis, F. B. Dian ; Khodra, Masayu Leylia
Author_Institution
Inf. Eng., Bandung Inst. of Technol., Bandung, Indonesia
fYear
2011
fDate
17-19 July 2011
Firstpage
1
Lastpage
6
Abstract
There are two problems in using words to represent document contents and query in information retrieval: ambiguity and different words which represent the same concept. These problems can be addressed by using query expansion. We focused on analysing the implementation of query expansion, word sense disambiguation (WSD), iterated relevance feedback, and some retrieval variations to retrieval performance. In this paper, WSD is implemented in Lucene using query expansion with thesaurus and relevance feedback. Extended Lesk algorithm was re-implemented to disambiguate the query using WordNet. Expansion terms were limited up to 20 words chosen from expansion term candidates from disambiguated query´s senses information, co-occurrence terms, and most frequent terms using Kullback-Leibler Distance. We iterated the process to find the best number of expansion iteration. We found that the method using WSD to query can extend search process time to 161 times longer at worst. Query expansion using disambiguated sense information did not affect the performance much while using information from relevance feedback did. This experiment provides better understanding of WSD in information retrieval system performance.
Keywords
natural language processing; query processing; relevance feedback; Kullback-Leibler distance; Lesk algorithm; Lucene; WordNet; ambiguity problem; different word problem; information retrieval; iterated relevance feedback; query expansion; word sense disambiguation; Context; Indexing; Informatics; Information retrieval; Joints; System performance; Thesauri; Query expansion; information retrieval system; relevance feedback; word sense disambiguation;
fLanguage
English
Publisher
ieee
Conference_Titel
Electrical Engineering and Informatics (ICEEI), 2011 International Conference on
Conference_Location
Bandung
ISSN
2155-6822
Print_ISBN
978-1-4577-0753-7
Type
conf
DOI
10.1109/ICEEI.2011.6021532
Filename
6021532
Link To Document