• DocumentCode
    422683
  • Title

    A new approach for fuzzy clustering of Web documents

  • Author

    Friedman, Menahem ; Last, Mark ; Zaafrany, Omer ; Schneider, Moti ; Kandel, Abraham

  • Author_Institution
    Dept. of Phys., Nucl. Res. Center-Negev, Beer-Sheva, Israel
  • Volume
    1
  • fYear
    2004
  • fDate
    25-29 July 2004
  • Firstpage
    377
  • Abstract
    Most existing methods of document clustering are based on the classical vector-space model, which represents each document by a fixed-size vector of key terms or key phrases. In large and diverse document collections such as the World Wide Web, this approach suffers from a tremendous computational overload, since the constant size of the term vector equals to the total number of key terms in all documents. We propose a new fuzzy-based approach to clustering documents that are represented by vectors of variable size. Each entry in a vector consists of two fields. The first field is the name of a key phrase in the document and the second denotes an importance weight associated with this key phrase within the particular document. We will describe the proposed approach in detail and show how it is implemented in a real world application from the area of web monitoring.
  • Keywords
    Internet; document handling; fuzzy set theory; pattern clustering; statistical analysis; Web documents; World Wide Web; document clustering; fuzzy clustering; Clustering algorithms; Clustering methods; Computer science; Educational institutions; Electronic mail; Fuzzy systems; Information systems; Physics; Systems engineering and theory; Web sites;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Fuzzy Systems, 2004. Proceedings. 2004 IEEE International Conference on
  • ISSN
    1098-7584
  • Print_ISBN
    0-7803-8353-2
  • Type

    conf

  • DOI
    10.1109/FUZZY.2004.1375752
  • Filename
    1375752