• DocumentCode
    2480976
  • Title

    Feature Space Transformations in Document Clustering

  • Author

    Csorba, Kristóf ; Vajk, István

  • Author_Institution
    Dept. of Autom. & Appl. Informatics, Budapest Univ. of Technol. & Econ.
  • fYear
    0
  • fDate
    0-0 0
  • Firstpage
    175
  • Lastpage
    179
  • Abstract
    Document clustering is a part of information retrieval, where documents written in natural language are being assigned to different groups based on some criteria. In the current case, documents with similar topics are collected. As there are many methods and additional noise filtering techniques to do this, this paper focuses on the composition of such transformations and on the comparison of the configurations build from a subset of these transformations as tiles of the whole procedure. 5 tile methods (term filtering, frequency quantizing, principal component analysis (PCA), term clustering and document clustering of course) are used. These are compared based on the maximal achieved F-measure and time consumption to find the best composition
  • Keywords
    document handling; information retrieval; pattern clustering; principal component analysis; PCA; document clustering; document collection; document retrieval; feature space transformation; frequency quantization; information retrieval; natural language; noise filtering technique; principal component analysis; term clustering; term filtering; Automation; Filtering; Frequency; Informatics; Information retrieval; Natural languages; Principal component analysis; Space technology; Text analysis; Tiles;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Intelligent Engineering Systems, 2006. INES '06. Proceedings. International Conference on
  • Conference_Location
    London
  • Print_ISBN
    0-7803-9708-8
  • Type

    conf

  • DOI
    10.1109/INES.2006.1689364
  • Filename
    1689364