DocumentCode :
2348053
Title :
Use relative weight to improve the kNN for unbalanced text category
Author :
Liu, Xiaodong ; Ren, Fuji ; Yuan, Caixia
Author_Institution :
Dept. of Inf. Sci. & Intell. Syst., Univ. of Tokushima, Tokushima, Japan
fYear :
2010
fDate :
21-23 Aug. 2010
Firstpage :
1
Lastpage :
5
Abstract :
The technology of text category is widely used in natural language processing. As one of best text category algorithms, kNN is very popular used in many applications. Traditional kNN assumes that the distribution of training data is even, however, it is not the case for many situations. When we used kNN in our Topic Detection and Tracking (TDT) system, it did not perform well due to the bias of training data set. To overcome the obstacle caused by data bias, this paper proposes an approach which uses relative weight to adjust the weight of kNN (RWKNN). When evaluated on the data of TDT2 and TDT3 Chinese corpus, RWKNN proves to be robust on unbalanced data and yields better performance than the traditional kNN.
Keywords :
learning (artificial intelligence); natural language processing; pattern classification; text analysis; k-nearest neighbor; natural language processing; relative weight approach; topic detection-and-tracking system; unbalanced text category; Feature extraction; Telecommunications; k-nearest neighbor; relative weight; text category;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Natural Language Processing and Knowledge Engineering (NLP-KE), 2010 International Conference on
Conference_Location :
Beijing
Print_ISBN :
978-1-4244-6896-6
Type :
conf
DOI :
10.1109/NLPKE.2010.5587799
Filename :
5587799
Link To Document :
بازگشت