DocumentCode :
3289849
Title :
Building a Test Collection for Sorani Kurdish
Author :
Esmaili, Kyumars Sheykh ; Eliassi, Donya ; Salavati, Shahin ; Aliabadi, Purya ; Mohammadi, Arash ; Yosefi, Somayeh ; Hakimi, S.
Author_Institution :
Nanyang Technol. Univ., Singapore, Singapore
fYear :
2013
fDate :
27-30 May 2013
Firstpage :
1
Lastpage :
7
Abstract :
Despite having a large number of speakers, Sorani - one of the two principle branches of the Kurdish language - is among the less-resourced languages. This paper reports on the outcomes of a project aimed at providing the essential resources for processing Sorani texts. The primary output of this project is Pewan, the first standard Test Collection to evaluate Sorani Information Retrieval systems. The other language resources that we have constructed in this project are: (i) a light-stemmer, (ii) a list of affixes, and (iii) a list of stopwords. We also used these newly-built resources to study the effectiveness of basic IR strategies on Sorani documents. Our experimental results show that normalization and, to a lesser extent, stemming can greatly improve the performance of Sorani IR systems.
Keywords :
information retrieval systems; natural language processing; project management; text analysis; Pewan; Sorani Kurdish language; Sorani information retrieval system evaluation; Sorani text processing; affixes; less-resourced languages; light-stemmer; standard test collection; stopwords; Buildings; Educational institutions; Information retrieval; Morphology; Reliability; Standards; Writing;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computer Systems and Applications (AICCSA), 2013 ACS International Conference on
Conference_Location :
Ifrane
ISSN :
2161-5322
Type :
conf
DOI :
10.1109/AICCSA.2013.6616470
Filename :
6616470
Link To Document :
بازگشت