DocumentCode
3339395
Title
Building naive bayes document classifier using word clusters based on bootstrap averaging
Author
Wang Yuanzhe ; Zhang Qiang ; Bai Liyuan
Author_Institution
Inst. of Inf. Eng., Wuhan Univ. of Technol., Wuhan, China
Volume
1
fYear
2009
fDate
14-16 Aug. 2009
Firstpage
202
Lastpage
207
Abstract
Aimed to solve the problem of low classification accuracy caused by poor distribution estimation by training naive Bayes document classifier on word clusters, we build a sequential word list based on mutual information between words and their semantic cluster labels, then construct a sample set of the same size with the word list through bootstrap sampling and use the average of the corresponding parameters estimated from the sample set as the last parameter to classify unknown documents. Experiment results on benchmark document data sets show that the proposed strategy gains higher classification accuracy comparing to naive Bayes documents classifier on word clusters or on words.
Keywords
Bayes methods; document handling; bootstrap averaging; bootstrap sampling; distribution estimation; naive Bayes document classifier; semantic cluster labels; word clusters; Classification algorithms; Clustering algorithms; Data mining; Machine learning; Mutual information; Parameter estimation; Probability distribution; Sampling methods; Sorting;
fLanguage
English
Publisher
ieee
Conference_Titel
IT in Medicine & Education, 2009. ITIME '09. IEEE International Symposium on
Conference_Location
Jinan
Print_ISBN
978-1-4244-3928-7
Electronic_ISBN
978-1-4244-3930-0
Type
conf
DOI
10.1109/ITIME.2009.5236431
Filename
5236431
Link To Document