مرکز منطقه ای اطلاع رساني علوم و فناوري - Text Categorization for Vietnamese Documents

DocumentCode :

1844060

Title :

Text Categorization for Vietnamese Documents

Author :

Nguyen, Giang-Son ; Gao, Xiaoying ; Andreae, Peter

Volume :

fYear :

2009

fDate :

15-18 Sept. 2009

Firstpage :

466

Lastpage :

469

Abstract :

Many machine learning methods have been proposed for text categorization, but most research has applied them to English documents. Vietnamese is a different language with different features and it is not clear whether the standard methods will work on the categorization of Vietnamese documents. This paper describes morphological level document representations that are appropriate for Vietnamese text documents and investigates the effectiveness of several standard learning algorithms including Naïve Bayes, K-Nearest Neighbour (KNN) and Support Vector Machine (SVM) with four different kernel functions. The results show that it is possible to build effective and efficient classifiers for Vietnamese text categorization using our representations and the standard algorithms, and demonstrate that the performance can be improved by using infogain for feature selection and using an external dictionary for filtering the vocabulary.

Keywords :

Dictionaries; Filtering algorithms; Kernel; Learning systems; Machine learning; Natural languages; Support vector machine classification; Support vector machines; Text categorization; Vocabulary; Vietnamese language processing; classification; machine learning;

fLanguage :

English

Publisher :

iet

Conference_Titel :

Web Intelligence and Intelligent Agent Technologies, 2009. WI-IAT '09. IEEE/WIC/ACM International Joint Conferences on

Conference_Location :

Milan, Italy

Print_ISBN :

978-0-7695-3801-3

Electronic_ISBN :

978-1-4244-5331-3

Type :

conf

DOI :

10.1109/WI-IAT.2009.327

Filename :

5285049

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=1844060