Title :
Incremental document clustering using Multi-representation Indexing Tree
Author :
Wang, Lifeng ; Song, Hui ; Liu, Xiaoqiang
Author_Institution :
Department of computer science and technology, Donghua University, Shanghai, China
Abstract :
Incremental Document Clustering is a powerful technique for large-scale topic discovery from incremental documentation set. Indexing tree algorithm is advanced in efficiency. However, it tended to process spherical data. To address this problem, we present a novel Multi-Representation Indexing Tree (MRIT) algorithm for constructing a hierarchy that satisfies arbitrary shape clusters with a good performance. Compared with the Indexing tree algorithm, a cluster is decomposed into several sub clusters and is represented as a union of the sub clusters rather than the center of the cluster. Similarity of a document to one cluster is the distance to the nearest neighbor among the cluster´s representative points. The experimental results on a variety of domains demonstrate that our algorithm can produce a quality cluster. It´s insensitive to document input order, and efficient in terms of computational time.
Keywords :
Accuracy; Algorithm design and analysis; Clustering algorithms; Feature extraction; Heuristic algorithms; Indexing; Nearest neighbor searches; Incremental Clustering; Indexing Tree; MRIT; Multi-representation;
Conference_Titel :
Information Science and Engineering (ICISE), 2010 2nd International Conference on
Conference_Location :
Hangzhou, China
Print_ISBN :
978-1-4244-7616-9
DOI :
10.1109/ICISE.2010.5690332