• DocumentCode
    3065666
  • Title

    A Novel Text Classification Algorithm Based on Naïve Bayes and KL-Divergence

  • Author

    Baoyi Wang ; Shaomin Zhang

  • Author_Institution
    North China Electric Power University, China
  • fYear
    2005
  • fDate
    5-8 Dec. 2005
  • Firstpage
    913
  • Lastpage
    915
  • Abstract
    The Naive Bayes classifier is a popular machine learning method for text classification because it is fast and easy to implement and performs well. Its severe assumption that each feature word is independent with other feature words in a document makes higher efficiency possible but also adversely affects the quality of its results because some of feature words are interrelated. In this paper, in order to enhance the performance of the text classification, some solutions are proposed to some of the problems with Naïve Bayes classifiers. Based on the original Naive Bayes algorithm, we take feature weight into account and make it a factor and combine KL-divergence (relative entropy) between the words to improve Naïve Bayes classifier. The improved Naïve Bayes classification algorithm is called INBA. By theory and experiment analyses it is proved that INBA algorithm not only has advantages of Naïve Bayes classifier, but also results in higher classification accuracy, and the solutions are feasible, practical and effective.
  • Keywords
    Algorithm design and analysis; Classification algorithms; Decision trees; Entropy; Learning systems; Neural networks; Niobium; Support vector machine classification; Support vector machines; Text categorization;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Parallel and Distributed Computing, Applications and Technologies, 2005. PDCAT 2005. Sixth International Conference on
  • Conference_Location
    Dalian, China
  • Print_ISBN
    0-7695-2405-2
  • Type

    conf

  • DOI
    10.1109/PDCAT.2005.36
  • Filename
    1579062