Title :
The Application Research of Topic Word List In Text Automatic Classification
Author :
Huang, Huan ; Liu, Qingtang ; Wu, Linjing ; Huang, Tao ; Yuan, Shuai
Author_Institution :
Eng. & Res. Center for Inf. Technol. on Educ., Huazhong Normal Univ., Wuhan, China
fDate :
Nov. 30 2009-Dec. 1 2009
Abstract :
When the traditional text classification technologies classify academic dissertations, the dimension of extracted feature terms is high, and they can´t represent the theme of thesis. it makes the efficiency is very low and the accuracy rate is not high. The topic words are small in quantity and can reflect the theme of thesis well. Accordingly, the paper proposes to extract the topic words with topic word list and uses topic words as feature terms. Then using the Bayesian classification method classifies vast texts. The experiments show that the Bayesian classification method using topic words as feature terms can greatly reduce the dimension and improve the efficiency of classification, when the dimension of feature terms is equivalent, the accuracy of Bayesian classification method using topic words as feature terms is also higher than the traditional Bayesian text classification methods.
Keywords :
text analysis; word processing; Bayesian classification method; Bayesian text classification methods; academic dissertations; feature terms extraction; text automatic classification; topic word list; Bayesian methods; Classification algorithms; Data mining; Feature extraction; Information technology; Machine learning algorithms; Support vector machine classification; Support vector machines; Terminology; Text categorization; Bayes Classification; Text Classification; topic word;
Conference_Titel :
Knowledge Acquisition and Modeling, 2009. KAM '09. Second International Symposium on
Conference_Location :
Wuhan
Print_ISBN :
978-0-7695-3888-4
DOI :
10.1109/KAM.2009.268