• DocumentCode
    507638
  • Title

    The Application Research of Topic Word List In Text Automatic Classification

  • Author

    Huang, Huan ; Liu, Qingtang ; Wu, Linjing ; Huang, Tao ; Yuan, Shuai

  • Author_Institution
    Eng. & Res. Center for Inf. Technol. on Educ., Huazhong Normal Univ., Wuhan, China
  • Volume
    2
  • fYear
    2009
  • fDate
    Nov. 30 2009-Dec. 1 2009
  • Firstpage
    111
  • Lastpage
    114
  • Abstract
    When the traditional text classification technologies classify academic dissertations, the dimension of extracted feature terms is high, and they can´t represent the theme of thesis. it makes the efficiency is very low and the accuracy rate is not high. The topic words are small in quantity and can reflect the theme of thesis well. Accordingly, the paper proposes to extract the topic words with topic word list and uses topic words as feature terms. Then using the Bayesian classification method classifies vast texts. The experiments show that the Bayesian classification method using topic words as feature terms can greatly reduce the dimension and improve the efficiency of classification, when the dimension of feature terms is equivalent, the accuracy of Bayesian classification method using topic words as feature terms is also higher than the traditional Bayesian text classification methods.
  • Keywords
    text analysis; word processing; Bayesian classification method; Bayesian text classification methods; academic dissertations; feature terms extraction; text automatic classification; topic word list; Bayesian methods; Classification algorithms; Data mining; Feature extraction; Information technology; Machine learning algorithms; Support vector machine classification; Support vector machines; Terminology; Text categorization; Bayes Classification; Text Classification; topic word;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Knowledge Acquisition and Modeling, 2009. KAM '09. Second International Symposium on
  • Conference_Location
    Wuhan
  • Print_ISBN
    978-0-7695-3888-4
  • Type

    conf

  • DOI
    10.1109/KAM.2009.268
  • Filename
    5362266