DocumentCode :
539313
Title :
Applying machine learning algorithms for automatic Persian text classification
Author :
Farhoodi, Mojgan ; Yari, Alireza
Author_Institution :
Iran Telecommun. Res. Center, Iran
fYear :
2010
fDate :
Nov. 30 2010-Dec. 2 2010
Firstpage :
318
Lastpage :
323
Abstract :
Automatic document classification due to its various applications in data mining and information technology is one of the important topics in computer science. Classification plays a vital role in many information management and retrieval tasks. Document classification, also known as document categorization, is the process of assigning a document to one or more predefined category labels. Classification is often posed as a supervised learning problem in which a set of labeled data is used to train a classifier which can be applied to label future examples. Document classification includes different parts such as text processing, feature extraction, feature vector construction and final classification. Thus improvement in each part should lead to better results in document classification. In this paper, we apply machine learning methods for automatic Persian news classification. In this regard, we first try to exert some language preprocess in Hamshahri dataset, and then we extract a feature vector for each news text by using feature weighting and feature selection algorithms. After that we train our classifier by support vector machine (SVM) and K-nearest neighbor (KNN) algorithms. In Experiments, although both algorithms show acceptable results for Persian text classification, the performance of KNN is better in comparison to SVM.
Keywords :
data mining; feature extraction; information retrieval; learning (artificial intelligence); natural language processing; pattern classification; support vector machines; text analysis; Hamshahri dataset; K-nearest neighbor algorithm; automatic Persian text classification; automatic document classification; data mining; document categorization; document classification; feature extraction; feature selection algorithm; feature vector construction; feature weighting; information management; information retrieval; information technology; machine learning algorithm; supervised learning problem; support vector machine; text processing; Classification algorithms; Feature extraction; Kernel; Machine learning algorithms; Support vector machine classification; Text categorization; Hamshahri; KNN; SVM; feature selection; machine learning; text classification;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Advanced Information Management and Service (IMS), 2010 6th International Conference on
Conference_Location :
Seoul
Print_ISBN :
978-1-4244-8599-4
Electronic_ISBN :
978-89-88678-32-9
Type :
conf
Filename :
5713467
Link To Document :
بازگشت