Title of article

A New Hybrid Model of K-Means and Naïve Bayes Algorithms for Feature Selection in Text Documents Categorization

Author/Authors

Allahverdipour, Ali Department of Computer Engineering - Urmia Branch Islamic Azad University, Urmia, Iran , Soleimanian Gharehchopogh, Farhad Department of Computer Engineering - Urmia Branch Islamic Azad University, Urmia, Iran

Pages

From page

To page

Abstract

With increasing speed of information and documents on the Web, need to classify them in different categories and clusters to be felt. Clustering try to find related structures in datasets which they are not categorized, yet. Concerning the needs, a new approach for text documents categorization is presented in this paper which included three phases: pre-processing documents and selection feature, K-Means clustering and Naïve Bayes (NB) optimization. The proposed model uses K-Means and NB algorithms that utilize K-Means algorithm to find minimum distances between features from center of clusters and NB algorithm for computing the probability of each feature into documents and using them to clustering features, separately. The proposed model optimizes performance of K-Means algorithm by using NB properties in clustering. Therefore, the model overcomes to the challenges of labeling different documents and origin of K-Means algorithm which it refers to categorizing text documents as un-supervised model. Finally, the experiment results of proposed algorithm and K-Means algorithms are evaluated based on evaluation methods and are compared in validated datasets.

Keywords

Text Categorization , Machine Learning , Feature Selection , K-Means Algorithm , Naïve Bayes Algorithm

Journal title

Journal of Advances in Computer Research

Serial Year

2017

Record number

2497500

Link To Document

https://search.isc.ac/dl/search/defaultta.aspx?DTC=10&DC=2497500