Title :
A Fast KNN Algorithm for Text Categorization
Author :
Wang, Yu ; Wang, Zheng-Ou
Author_Institution :
Hebei Univ., Baoding
Abstract :
The KNN algorithm applied to text categorization is a simple, valid and non-parameter method. The traditional KNN has a fatal defect that the time of similarity computing is huge. The practicality will be lost when the KNN algorithm is applied to text categorization with the high dimension and huge samples. In this paper, a method called TFKNN(Tree-Fast-K-Nearest-Neighbor) is presented, which can search the exact k nearest neighbors quickly. In the method, a SSR tree for searching K nearest neighbors is created, in which all child nodes of each non-leaf node are ranked according to the distances between their central points and the central point of their parent. Then the searching scope is reduced based on the tree. Subsequently , the time of similarity computing is decreased largely.
Keywords :
pattern classification; text analysis; TFKNN; similarity computing; text categorization; tree fast K-nearest-neighbor; Computer science; Cybernetics; Machine learning; Machine learning algorithms; Mathematics; Nearest neighbor searches; Support vector machines; Systems engineering and theory; Text categorization; Web sites; KNN; SSR-tree; Similarity; Text categorization;
Conference_Titel :
Machine Learning and Cybernetics, 2007 International Conference on
Conference_Location :
Hong Kong
Print_ISBN :
978-1-4244-0973-0
Electronic_ISBN :
978-1-4244-0973-0
DOI :
10.1109/ICMLC.2007.4370742