• DocumentCode
    479746
  • Title

    Text Feature Extraction Based on the Extension of Topic Words and Fuzzy Set

  • Author

    Hu Jinzhu ; Shu Jiangbo ; Huang Yuying

  • Author_Institution
    Dept. of Comput. Sci., Central China Normal Univ., Wuhan
  • Volume
    1
  • fYear
    2008
  • fDate
    12-14 Dec. 2008
  • Firstpage
    219
  • Lastpage
    222
  • Abstract
    Text feature extraction is one of the foundation of natural language processing, the traditional TF-IDF weight calculation method only consider characteristics of the frequency, but those feature items on different positions have different contributions to the text classification. By considering characteristics of the frequency, position and the mutual relations, an improved weight calculation method TF-IDF-Rel has been proposed based on the extension of topic words, and on the basis of this, embedding fuzzy set theory for the discretization of weight values. Experiment shows that this method is better than the traditional TF-IDF method of classification, and the recall rate and the accuracy rate have improved.
  • Keywords
    classification; feature extraction; fuzzy set theory; natural language processing; text analysis; TF-IDF weight value discretization calculation method; fuzzy set theoy; natural language processing; text classification feature extraction; topic word extension; Computer science; Data mining; Feature extraction; Frequency; Fuzzy set theory; Fuzzy sets; Information processing; Machinery; Space technology; Text categorization; extension of topic words; feature words selection; fuzzy set;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer Science and Software Engineering, 2008 International Conference on
  • Conference_Location
    Wuhan, Hubei
  • Print_ISBN
    978-0-7695-3336-0
  • Type

    conf

  • DOI
    10.1109/CSSE.2008.1189
  • Filename
    4721730