DocumentCode
479746
Title
Text Feature Extraction Based on the Extension of Topic Words and Fuzzy Set
Author
Hu Jinzhu ; Shu Jiangbo ; Huang Yuying
Author_Institution
Dept. of Comput. Sci., Central China Normal Univ., Wuhan
Volume
1
fYear
2008
fDate
12-14 Dec. 2008
Firstpage
219
Lastpage
222
Abstract
Text feature extraction is one of the foundation of natural language processing, the traditional TF-IDF weight calculation method only consider characteristics of the frequency, but those feature items on different positions have different contributions to the text classification. By considering characteristics of the frequency, position and the mutual relations, an improved weight calculation method TF-IDF-Rel has been proposed based on the extension of topic words, and on the basis of this, embedding fuzzy set theory for the discretization of weight values. Experiment shows that this method is better than the traditional TF-IDF method of classification, and the recall rate and the accuracy rate have improved.
Keywords
classification; feature extraction; fuzzy set theory; natural language processing; text analysis; TF-IDF weight value discretization calculation method; fuzzy set theoy; natural language processing; text classification feature extraction; topic word extension; Computer science; Data mining; Feature extraction; Frequency; Fuzzy set theory; Fuzzy sets; Information processing; Machinery; Space technology; Text categorization; extension of topic words; feature words selection; fuzzy set;
fLanguage
English
Publisher
ieee
Conference_Titel
Computer Science and Software Engineering, 2008 International Conference on
Conference_Location
Wuhan, Hubei
Print_ISBN
978-0-7695-3336-0
Type
conf
DOI
10.1109/CSSE.2008.1189
Filename
4721730
Link To Document