Title :
A new approach for fuzzy clustering of Web documents
Author :
Friedman, Menahem ; Last, Mark ; Zaafrany, Omer ; Schneider, Moti ; Kandel, Abraham
Author_Institution :
Dept. of Phys., Nucl. Res. Center-Negev, Beer-Sheva, Israel
Abstract :
Most existing methods of document clustering are based on the classical vector-space model, which represents each document by a fixed-size vector of key terms or key phrases. In large and diverse document collections such as the World Wide Web, this approach suffers from a tremendous computational overload, since the constant size of the term vector equals to the total number of key terms in all documents. We propose a new fuzzy-based approach to clustering documents that are represented by vectors of variable size. Each entry in a vector consists of two fields. The first field is the name of a key phrase in the document and the second denotes an importance weight associated with this key phrase within the particular document. We will describe the proposed approach in detail and show how it is implemented in a real world application from the area of web monitoring.
Keywords :
Internet; document handling; fuzzy set theory; pattern clustering; statistical analysis; Web documents; World Wide Web; document clustering; fuzzy clustering; Clustering algorithms; Clustering methods; Computer science; Educational institutions; Electronic mail; Fuzzy systems; Information systems; Physics; Systems engineering and theory; Web sites;
Conference_Titel :
Fuzzy Systems, 2004. Proceedings. 2004 IEEE International Conference on
Print_ISBN :
0-7803-8353-2
DOI :
10.1109/FUZZY.2004.1375752