• DocumentCode
    2828281
  • Title

    Documents Clustering Based on Optimized Compressibility Vector Space

  • Author

    Zhang, Nuo ; Watanabe, Toshinori

  • Author_Institution
    Grad. Sch. of Inf. Syst., Univ. of Electro-Commun., Chofu, Japan
  • fYear
    2009
  • fDate
    11-13 Dec. 2009
  • Firstpage
    1
  • Lastpage
    5
  • Abstract
    To access and store large-scale electrical documents becomes possible due to the high performance of computer hardware and broadband accessible network. In order to handle these increasing number of documents properly, a efficient document representation model is as important as the classification algorithms. Several text representation methods, such as bag-of-words and N-gram models, have been widely used. Another representation approach named pattern representation scheme using data compression (PRDC) has been proposed lately. It does not only independently process data of linguistic text, but also processes multimedia data effectively. In this study, we will propose a method to improve PRDC approach and compare it with the two aforementioned methods. The performances will be compared in terms of clustering ability. Experiment results will show that the proposed method can provide better performance than that of the other two methods and also the PRDC.
  • Keywords
    computational linguistics; data compression; multimedia systems; pattern classification; pattern clustering; text analysis; broadband accessible network; classification algorithms; computer hardware; data compression; documents clustering; electrical documents; linguistic text; multimedia data; optimized compressibility vector space; pattern representation; text representation; Classification algorithms; Computer networks; Data compression; Hardware; High performance computing; Information management; Information retrieval; Information systems; Large-scale systems; Text categorization;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computational Intelligence and Software Engineering, 2009. CiSE 2009. International Conference on
  • Conference_Location
    Wuhan
  • Print_ISBN
    978-1-4244-4507-3
  • Electronic_ISBN
    978-1-4244-4507-3
  • Type

    conf

  • DOI
    10.1109/CISE.2009.5363976
  • Filename
    5363976