مرکز منطقه ای اطلاع رساني علوم و فناوري - Incremental document clustering using cluster similarity histograms

DocumentCode :

2227580

Title :

Incremental document clustering using cluster similarity histograms

Author :

Hammouda, Khaled M. ; Kamel, Mohamed S.

Author_Institution :

Dept. of Syst. Design Eng., Waterloo Univ., Ont., Canada

fYear :

2003

fDate :

13-17 Oct. 2003

Firstpage :

597

Lastpage :

601

Abstract :

Clustering of large collections of text documents is a key process in providing a higher level of knowledge about the underlying inherent classification of the documents. Web documents, in particular, are of great interest since managing, accessing, searching, and browsing large repositories of Web content requires efficient organization. Incremental clustering algorithms are always preferred to traditional clustering techniques, since they can be applied in a dynamic environment such as the Web. An incremental document clustering algorithm is introduced, which relies only on pair-wise document similarity information. Clusters are represented using a cluster similarity histogram, a concise statistical representation of the distribution of similarities within each cluster, which provides a measure of cohesiveness. The measure guides the incremental clustering process. Complexity analysis and experimental results are discussed and show that the algorithm requires less computational time than standard methods while achieving a comparable or better clustering quality.

Keywords :

Internet; Web sites; computational complexity; content-based retrieval; document handling; pattern clustering; Web document; cluster similarity histogram representation; complexity analysis; document clustering algorithm; pair-wise document similarity information; statistical representation; text document; Algorithm design and analysis; Clustering algorithms; Clustering methods; Content management; Design engineering; Electronic mail; Histograms; Knowledge engineering; Systems engineering and theory; Web sites;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Web Intelligence, 2003. WI 2003. Proceedings. IEEE/WIC International Conference on

Print_ISBN :

0-7695-1932-6

Type :

conf

DOI :

10.1109/WI.2003.1241276

Filename :

1241276

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2227580