DocumentCode :
3585821
Title :
Morphological analyzer and generator for Tamil Language
Author :
Lushanthan, S. ; Weerasinghe, A.R. ; Herath, D.L.
Author_Institution :
Sch. of Comput., Univ. of Colombo, Colombo, Sri Lanka
fYear :
2014
Firstpage :
190
Lastpage :
196
Abstract :
Morphological analysis is an essential component in Natural Language Processing (NLP) applications ranging from spell checker to machine translation. When performing a morphological analysis it leads to segmentation of a word into morphemes, combined with an analysis of the attachments of these morphemes. In English language the complexity of the formation of words is not much higher compared with Indic languages. Hence, Tamil language too does have its complexities when building up a NLP application. The morphemes in the language, the rules how these morphemes are connected and the changes occur when they attach together are the important factors that need to be considered when building up a Morphological Analyzer for any language. Our “Morphological Analyzer and Generator for Tamil Language” will be generating the word forms of a stem/ root, given a particular context and at the same time, a surface form in Tamil language should get analyzed into its proper context. This model tries to cover only the nouns and verbs in the Tamil language. This paper illustrates how the lexicon and the orthographic rules of Tamil language have been written as regular expressions using only finite state operations and how this approach has been implemented to develop a morphological analyzer/generator. This model is built using the Xerox toolkit, which uses “Two-level Morphology”, and almost 2000 noun stems and 96 verb stems have been incorporated into the network. A noun stem now produces about 40 different forms and a verb stem produces up to 240 forms. We have also defined our own transliteration scheme for this purpose.
Keywords :
finite state machines; language translation; natural language processing; NLP; Tamil language; Xerox toolkit; finite state operations; lexicon; machine translation; morphemes; morphological analyzer; morphological generator; natural language processing; orthographic rules; regular expressions; spell checker; transliteration scheme; two-level morphology; word segmentation; Finite State Transducer; Morphology; Regular Expressions; Tamil Morphological Analyzer and Generator;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Advances in ICT for Emerging Regions (ICTer), 2014 International Conference on
Print_ISBN :
978-1-4799-7731-4
Type :
conf
DOI :
10.1109/ICTER.2014.7083900
Filename :
7083900
Link To Document :
بازگشت