Title :
Building a Test Collection for Sorani Kurdish
Author :
Esmaili, Kyumars Sheykh ; Eliassi, Donya ; Salavati, Shahin ; Aliabadi, Purya ; Mohammadi, Arash ; Yosefi, Somayeh ; Hakimi, S.
Author_Institution :
Nanyang Technol. Univ., Singapore, Singapore
Abstract :
Despite having a large number of speakers, Sorani - one of the two principle branches of the Kurdish language - is among the less-resourced languages. This paper reports on the outcomes of a project aimed at providing the essential resources for processing Sorani texts. The primary output of this project is Pewan, the first standard Test Collection to evaluate Sorani Information Retrieval systems. The other language resources that we have constructed in this project are: (i) a light-stemmer, (ii) a list of affixes, and (iii) a list of stopwords. We also used these newly-built resources to study the effectiveness of basic IR strategies on Sorani documents. Our experimental results show that normalization and, to a lesser extent, stemming can greatly improve the performance of Sorani IR systems.
Keywords :
information retrieval systems; natural language processing; project management; text analysis; Pewan; Sorani Kurdish language; Sorani information retrieval system evaluation; Sorani text processing; affixes; less-resourced languages; light-stemmer; standard test collection; stopwords; Buildings; Educational institutions; Information retrieval; Morphology; Reliability; Standards; Writing;
Conference_Titel :
Computer Systems and Applications (AICCSA), 2013 ACS International Conference on
Conference_Location :
Ifrane
DOI :
10.1109/AICCSA.2013.6616470