Title :
A lexicon pool augmented Naive Bayes Classifier for Nepali Text
Author :
Thakur, S.K. ; Singh, V.K.
Author_Institution :
Dept. of Comput. Sci., South Asian Univ., New Delhi, India
Abstract :
This paper presents our experimental work on machine classification of Nepali texts. We have implemented a Naive Bayes classifier for the task and then augmented it through a multinomial lexicon pooling. The lexicon-pooled Naive Bayes Classifier obtains better results on classification task as compared to a normal Naive Bayes implementation. This hybrid approach also helps in dealing with the unavailability of linguistic resources in Nepali (such as stemmer, stop word list and accurate POS tagger). The proposed lexicon-pooled Naive Bayes approach is evaluated by applying on a sufficiently large dataset of Nepalese news stories. The experimental results demonstrate the higher classification accuracy and usefulness of the method for Nepali text classification. The paper also contributes resources to Nepali language processing, in form of a Nepali news stories corpus and a domain specific lexicon for Nepali news stories.
Keywords :
Bayes methods; computational linguistics; natural language processing; pattern classification; text analysis; Nepali language processing; Nepali news stories corpus; Nepali text classification; domain specific lexicon; lexicon pool augmented naive Bayes classifier; linguistic resources; machine classification; multinomial lexicon pooling; normal naive Bayes implementation; Accuracy; Pragmatics; Probability; Text categorization; Training; Training data; Vocabulary; Multinomial Lexicon Pooling; Naive Bayes; Nepali Text Corpus; Text Classification;
Conference_Titel :
Contemporary Computing (IC3), 2014 Seventh International Conference on
Conference_Location :
Noida
Print_ISBN :
978-1-4799-5172-7
DOI :
10.1109/IC3.2014.6897231