• DocumentCode
    3339395
  • Title

    Building naive bayes document classifier using word clusters based on bootstrap averaging

  • Author

    Wang Yuanzhe ; Zhang Qiang ; Bai Liyuan

  • Author_Institution
    Inst. of Inf. Eng., Wuhan Univ. of Technol., Wuhan, China
  • Volume
    1
  • fYear
    2009
  • fDate
    14-16 Aug. 2009
  • Firstpage
    202
  • Lastpage
    207
  • Abstract
    Aimed to solve the problem of low classification accuracy caused by poor distribution estimation by training naive Bayes document classifier on word clusters, we build a sequential word list based on mutual information between words and their semantic cluster labels, then construct a sample set of the same size with the word list through bootstrap sampling and use the average of the corresponding parameters estimated from the sample set as the last parameter to classify unknown documents. Experiment results on benchmark document data sets show that the proposed strategy gains higher classification accuracy comparing to naive Bayes documents classifier on word clusters or on words.
  • Keywords
    Bayes methods; document handling; bootstrap averaging; bootstrap sampling; distribution estimation; naive Bayes document classifier; semantic cluster labels; word clusters; Classification algorithms; Clustering algorithms; Data mining; Machine learning; Mutual information; Parameter estimation; Probability distribution; Sampling methods; Sorting;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    IT in Medicine & Education, 2009. ITIME '09. IEEE International Symposium on
  • Conference_Location
    Jinan
  • Print_ISBN
    978-1-4244-3928-7
  • Electronic_ISBN
    978-1-4244-3930-0
  • Type

    conf

  • DOI
    10.1109/ITIME.2009.5236431
  • Filename
    5236431