Title :
Novel Text Classification Based on K-Nearest Neighbor
Author :
Yu, Xiao-Peng ; Yu, Xiao-Gao
Author_Institution :
Wuhan Univ., Wuhan
Abstract :
K-nearest neighbors classifier (KNNC) is widely used because of its simplicity and efficiency. It includes k-nearest neighbors search (KNNS) and classification. Existing centralized KNNS does not scale up to large volume of data, and the classification still suffers from inductive biases that result from its assumptions, such as the presumption that training data are evenly distributed This paper proposes a method (P2PKNNC) which improves performance of kNN based text classification in the P2P communication paradigm. P2PKNNC adaptively executes k nearest neighbor(s) queries in a distributed metric structure, which is based on the generalized hyperplane partitioning. And it selects the influencing part from these neighbors and classifies the input document in term of the disturbance degree which it brings to the kernel densities of these influencing neighbors for uneven text sets. The experimental results indicate that our algorithm achieves significant classification performance improvement on imbalanced corpora.
Keywords :
pattern classification; peer-to-peer computing; text analysis; distributed metric structure; generalized hyperplane partitioning; imbalanced corpora; input document classification; k-nearest neighbors search; peer-to-peer K-nearest neighbors classification; text classification; Algorithm design and analysis; Costs; Cybernetics; Economic forecasting; Kernel; Machine learning; Nearest neighbor searches; Testing; Text categorization; Training data; K-nearest neighbor; Kernel density estimation; P2P; Text classification;
Conference_Titel :
Machine Learning and Cybernetics, 2007 International Conference on
Conference_Location :
Hong Kong
Print_ISBN :
978-1-4244-0973-0
Electronic_ISBN :
978-1-4244-0973-0
DOI :
10.1109/ICMLC.2007.4370740