Title :
Text Feature Extraction Based on the Extension of Topic Words and Fuzzy Set
Author :
Hu Jinzhu ; Shu Jiangbo ; Huang Yuying
Author_Institution :
Dept. of Comput. Sci., Central China Normal Univ., Wuhan
Abstract :
Text feature extraction is one of the foundation of natural language processing, the traditional TF-IDF weight calculation method only consider characteristics of the frequency, but those feature items on different positions have different contributions to the text classification. By considering characteristics of the frequency, position and the mutual relations, an improved weight calculation method TF-IDF-Rel has been proposed based on the extension of topic words, and on the basis of this, embedding fuzzy set theory for the discretization of weight values. Experiment shows that this method is better than the traditional TF-IDF method of classification, and the recall rate and the accuracy rate have improved.
Keywords :
classification; feature extraction; fuzzy set theory; natural language processing; text analysis; TF-IDF weight value discretization calculation method; fuzzy set theoy; natural language processing; text classification feature extraction; topic word extension; Computer science; Data mining; Feature extraction; Frequency; Fuzzy set theory; Fuzzy sets; Information processing; Machinery; Space technology; Text categorization; extension of topic words; feature words selection; fuzzy set;
Conference_Titel :
Computer Science and Software Engineering, 2008 International Conference on
Conference_Location :
Wuhan, Hubei
Print_ISBN :
978-0-7695-3336-0
DOI :
10.1109/CSSE.2008.1189