Title of article :
Evaluation of n-gram conflation approaches for Arabic text retrieval
Author/Authors :
Farag Ahmed، نويسنده , , Andreas Nürnberger، نويسنده ,
Issue Information :
ماهنامه با شماره پیاپی سال 2009
Pages :
18
From page :
1448
To page :
1465
Abstract :
In this paper we present a language-independent approach for conflation that does not depend on predefined rules or prior knowledge of the target language. The proposed unsupervised method is based on an enhancement of the pure n-gram model that can group related words based on various string-similarity measures, while restricting the search to specific locations of the target word by taking into account the order of n-grams. We show that the method is effective to achieve high score similarities for all word-form variations and reduces the ambiguity, i.e., obtains a higher precision and recall, compared to pure n-gram-based approaches for English, Portuguese, and Arabic. The proposed method is especially suited for conflation approaches in Arabic, since Arabic is a highly inflectional language. Therefore, we present in addition an adaptive user interface for Arabic text retrieval called “araSearch”. araSearch serves as a metasearch interface to existing search engines. The system is able to extend a query using the proposed conflation approach such that additional results for relevant subwords can be found automatically.
Journal title :
Journal of the American Society for Information Science and Technology
Serial Year :
2009
Journal title :
Journal of the American Society for Information Science and Technology
Record number :
994008
Link To Document :
بازگشت