Title :
KD-KNN Text Categorization Method Based On Improvement TFI-DF
Author :
Li, LiangJun ; Zhang, Bin ; Che, Yuanyuan ; Yang, Ming
Author_Institution :
Sch. of Inf. Sci. & Eng., Northeast Univ., Shenyang, China
Abstract :
Text categorization under the assigned classification system, uses natural languages to express texts that sort automatically into the pre-set type set according to its contents and makes a piece of text correspond to one (several) type(s) of type set. In order to enhance the categorization performance, this paper proposes the improved TFI-DF feature extraction method and kernel-based distance-weighted KNN algorithm with the improved TFI-DF feature selection method to preprocess the data source and set up the space vector model, which provides a convenient data structure for text categorization, and enhances the precision of the feature selection method. The KNN algorithm (kernel-based distance-weighted KNN) has solved the multi-peak distribution and boundary overlap problems of samples and the precise classified decision problem of classifiers. The experimental system has proved the effectiveness of the new method.
Keywords :
data mining; feature extraction; learning (artificial intelligence); text analysis; TFI-DF feature extraction method; boundary overlap problem; k-nearest neighbor algorithm; kernel-based distance-weighted KNN algorithm; multipeak distribution problem; natural languages; text categorization; Boolean functions; Channel hot electron injection; Data preprocessing; Electronic mail; Feature extraction; Frequency; Information science; Natural languages; Text categorization; Text processing;
Conference_Titel :
Information Engineering and Computer Science, 2009. ICIECS 2009. International Conference on
Conference_Location :
Wuhan
Print_ISBN :
978-1-4244-4994-1
DOI :
10.1109/ICIECS.2009.5364998