Title of article :
A New Hybrid Model of K-Means and Naïve Bayes Algorithms for Feature Selection in Text Documents Categorization
Author/Authors :
Allahverdipour, Ali Department of Computer Engineering - Urmia Branch Islamic Azad University, Urmia, Iran , Soleimanian Gharehchopogh, Farhad Department of Computer Engineering - Urmia Branch Islamic Azad University, Urmia, Iran
Pages :
14
From page :
73
To page :
86
Abstract :
With increasing speed of information and documents on the Web, need to classify them in different categories and clusters to be felt. Clustering try to find related structures in datasets which they are not categorized, yet. Concerning the needs, a new approach for text documents categorization is presented in this paper which included three phases: pre-processing documents and selection feature, K-Means clustering and Naïve Bayes (NB) optimization. The proposed model uses K-Means and NB algorithms that utilize K-Means algorithm to find minimum distances between features from center of clusters and NB algorithm for computing the probability of each feature into documents and using them to clustering features, separately. The proposed model optimizes performance of K-Means algorithm by using NB properties in clustering. Therefore, the model overcomes to the challenges of labeling different documents and origin of K-Means algorithm which it refers to categorizing text documents as un-supervised model. Finally, the experiment results of proposed algorithm and K-Means algorithms are evaluated based on evaluation methods and are compared in validated datasets.
Keywords :
Text Categorization , Machine Learning , Feature Selection , K-Means Algorithm , Naïve Bayes Algorithm
Journal title :
Journal of Advances in Computer Research
Serial Year :
2017
Record number :
2497500
Link To Document :
بازگشت