Title of article :
Managing misspelled queries in IR applications
Author/Authors :
Jes?s Vilares، نويسنده , , Manuel Vilares، نويسنده , , Juan Otero، نويسنده ,
Issue Information :
دوماهنامه با شماره پیاپی سال 2011
Pages :
24
From page :
263
To page :
286
Abstract :
Our work concerns the design of robust information retrieval environments that can successfully handle queries containing misspelled words. Our aim is to perform a comparative analysis of the efficacy of two possible strategies that can be adopted. A first strategy involves those approaches based on correcting the misspelled query, thus requiring the integration of linguistic information in the system. This solution has been studied from complementary standpoints, according to whether contextual information of a linguistic nature is integrated in the process or not, the former implying a higher degree of complexity. A second strategy involves the use of character n-grams as the basic indexing unit, which guarantees the robustness of the information retrieval process whilst at the same time eliminating the need for a specific query correction stage. This is a knowledge-light and language-independent solution which requires no linguistic information for its application. Both strategies have been subjected to experimental testing, with Spanish being used as the case in point. This is a language which, unlike English, has a great variety of morphological processes, making it particularly sensitive to spelling errors. The results obtained demonstrate that stemming-based approaches are highly sensitive to misspelled queries, particularly with short queries. However, such a negative impact can be effectively reduced by the use of correction mechanisms during querying, particularly in the case of context-based correction, since more classical approaches introduce too much noise when query length is increased. On the other hand, our n-gram based strategy shows a remarkable robustness, with average performance losses appreciably smaller than those for stemming.
Keywords :
Misspelled queries , Character n-grams , Evaluation methodology , Spelling correction , information retrieval
Journal title :
Information Processing and Management
Serial Year :
2011
Journal title :
Information Processing and Management
Record number :
1229111
Link To Document :
بازگشت