• DocumentCode
    1206486
  • Title

    Dual Fuzzy-Possibilistic Coclustering for Categorization of Documents

  • Author

    Tjhi, William-Chandra ; Chen, Lihui

  • Author_Institution
    Sch. of Electr. & Electron. Eng., Nanyang Technol. Univ., Singapore
  • Volume
    17
  • Issue
    3
  • fYear
    2009
  • fDate
    6/1/2009 12:00:00 AM
  • Firstpage
    532
  • Lastpage
    543
  • Abstract
    In this paper, we develop a new soft model dual fuzzy-possibilistic coclustering (DFPC) for document categorization. The proposed model targets robustness to outliers and richer representations of coclusters. DFPC is inspired by an existing algorithm called possibilistic fuzzy C-means (PFCM) that hybridizes fuzzy and possibilistic clustering. It has been shown that PFCM can perform effectively for low-dimensional data clustering. To achieve our goal, we expand this existing idea by introducing a novel PFCM-like coclustering model. The new algorithm DFPC preserves the desired properties of PFCM. In addition, as a coclustering algorithm, DFPC is more suitable for our intended high-dimensional application: document clustering. Besides, the coclustering mechanism enables DFPC to generate, together with document clusters, fuzzy-possibilistic word memberships. These word memberships, which are absent in the existing PFCM model, can play an important role in generating useful descriptions of document clusters. We detail the formulation of the proposed model and provide an extensive analytical study of the algorithm DFPC. Experiments on an artificial dataset and various benchmark document datasets demonstrate the effectiveness and potential of DFPC.
  • Keywords
    classification; document handling; fuzzy set theory; pattern clustering; possibility theory; document categorization; document clustering; dual fuzzy-possibilistic coclustering; fuzzy-possibilistic word memberships; low-dimensional data clustering; possibilistic fuzzy c-means; Coclustering; Fuzzy clustering; co-clustering; document clustering; fuzzy clustering; information retrieval; possibilistic clustering; text mining;
  • fLanguage
    English
  • Journal_Title
    Fuzzy Systems, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1063-6706
  • Type

    jour

  • DOI
    10.1109/TFUZZ.2008.924332
  • Filename
    4505351