DocumentCode
172571
Title
Automatic acquisition of morphological resources for Melanau language
Author
Saee, Suhaila ; Lay-Ki Soon ; Tek-Yong Lim ; Ranaivo-Malancon, Bali ; Juk, Jovianna ; Tang, Enya Kong
Author_Institution
Fac. of Comput. & Inf., Multimedia Univ., Cyberjaya, Malaysia
fYear
2014
fDate
20-22 Oct. 2014
Firstpage
203
Lastpage
206
Abstract
Computational morphological resources are the crucial component needed in providing morphological information to create morphological analyser. To acquire the morphological resources in a manual way, two main components are required. The components, which are preprocessing and morphology induction, have led to two issues: i) time consuming and ii) ambiguity in managing the resources from under-resourced languages perspective. We proposed an automatic acquisition of morphological resources tool, which is an extension from the manual way, to overcome the mentioned issues. In this work, three main modules in the proposed automatic tool are: i) tokenization - to tokenise a raw text and generate a wordlist, ii) conversion - to convert a softcopy of morphological resources into required formats and iii) integration of segmentation tools - to integrate two established segmentation tools, namely, Linguistica and Morfessor, in obtaining morphological information from the generated wordlist. Two testing methods have been conducted are component and integration testing. Result shows the proposed tool has been devised and the effectiveness has been demonstrated which allows the linguist to obtain their wordlist and segmented data easily. We believe the proposed tool will assist other researchers to construct computational morphological resources in automated way for under-resourced languages.
Keywords
natural language processing; resource allocation; Linguistica tool; Melanau language; Morfessor tool; automatic morphological resource acquisition; component testing; computational morphological resources; conversion module; integration module; integration testing; morphological analyser; morphological information; tokenization module; under-resourced languages perspective; wordlist generation; Computer science; Conferences; Educational institutions; Manuals; Morphology; Testing; computational morphological resources; morphological analyser; pre-processing; under-resourced language;
fLanguage
English
Publisher
ieee
Conference_Titel
Asian Language Processing (IALP), 2014 International Conference on
Conference_Location
Kuching
Type
conf
DOI
10.1109/IALP.2014.6973523
Filename
6973523
Link To Document