• DocumentCode
    547309
  • Title

    An improved sIB algorithm for document clustering using combination weighting measures

  • Author

    Ji, Bo ; Ye, Yangdong

  • Author_Institution
    Sch. of Inf. Eng., Zhengzhou Univ., Zhengzhou, China
  • Volume
    3
  • fYear
    2011
  • fDate
    10-12 June 2011
  • Firstpage
    110
  • Lastpage
    114
  • Abstract
    This paper presents an improved sIB algorithm (CW-sIB) for high dimension document clustering using combination weighting. Traditionally, feature weighting researches on clustering devote themselves to search one single effective weighting scheme. However, how to choose a proper weighting scheme is a generally acknowledged devilish problem. To address this issue, we propose the linear combination weighting method derived from the idea of combination evaluation for multiple attribute decision making problem. The application of combination weighting can overcome the limitations of using single weighting scheme. It will help to reflect the essential characteristics of the document data better. The experiments on real document data have shown that the proposed CW-sIB algorithm is superior to the sIB algorithm. Meanwhile, we report results as to which combination of weighting scheme elements show merit in the decomposition of datasets.
  • Keywords
    decision making; pattern clustering; text analysis; CW-sIB algorithm; combination weighting measures; document clustering; feature weighting; improved sIB algorithm; information retrieval; linear combination weighting method; machine learning; multiple attribute decision making problem; text categorization; text mining methods; Accuracy; Clustering algorithms; Complexity theory; Indexes; Partitioning algorithms; Text categorization; Weight measurement;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer Science and Automation Engineering (CSAE), 2011 IEEE International Conference on
  • Conference_Location
    Shanghai
  • Print_ISBN
    978-1-4244-8727-1
  • Type

    conf

  • DOI
    10.1109/CSAE.2011.5952644
  • Filename
    5952644