Title of article :
Classification of Persian News Articles using Machine Learning Techniques
Author/Authors :
Mostafavi, Sareh Department of Computational Linguistics - Regional Information Center for Science and Technology (RICeST) - Shiraz - Fars, Iran , Pahlevanzadeh, Bahareh Department of Design and System Operations - Regional Information Center for Science and Technology (RICeST) - Shiraz - Fars, Iran , Falahati Qadimi Fumani, Mohammad Reza Department of Computational Linguistics - Regional Information Center for Science and Technology (RICeST) - Shiraz - Fars, Iran
Pages :
9
From page :
73
To page :
81
Abstract :
Automatic text classification, which is defined as the process of automatically classifying texts into predefined categories, has many applications in our everyday life, and it has recently gained much attention due to the increased num-ber of text documents available in electronic form. Classify-ing News articles is one of the applications of text classifica-tion. Automatic classification is a subset of machine learning techniques in which a classifier is built by learning from some pre-classified documents. Naïve Bayes and k-Nearest Neighbor are among the most common algorithms of ma-chine learning for text classification. In this paper, we sug-gest a way to improve the performance of a text classifier using Mutual information and Chi-square feature selection algorithms. We have observed that MI feature selection method can improve the accuracy of Naïve Bayes classifier up to 10%. The empirical results show that the proposed model achieves an average accuracy of 80% and an average F1-measure of 80%.
Keywords :
Automatic Persian text classification , K-Nearest Neighbor , Naïve Bayes , News text classification , Text cate-gorization , Text mining
Journal title :
Journal of Computer and Knowledge Engineering
Serial Year :
2020
Record number :
2686244
Link To Document :
بازگشت