• DocumentCode
    2350600
  • Title

    Unsupervised document clustering using multi-resolution latent semantic density analysis

  • Author

    Bellegarda, Jerome R.

  • Author_Institution
    Speech & Language Technol., Apple Inc., Cupertino, CA, USA
  • fYear
    2010
  • fDate
    Aug. 29 2010-Sept. 1 2010
  • Firstpage
    361
  • Lastpage
    366
  • Abstract
    To find meaningful groupings in a given document collection, it is essential to learn the right granularity for the domain, uncover core themes and attendant outliers, and derive suitable labels with which to characterize each of the resulting clusters. The outcome is therefore affected both by the choice of representation and by the behavior of the clustering algorithm. This paper advocates a strategy which combines density-based clustering with latent semantic feature extraction. Documents are first mapped into a latent semantic vector space, and then clustered in that space on the basis of a multi-resolution density measure. Empirical evidence gathered on several document collections suggests that this procedure is effective in identifying semantically sound document clusters.
  • Keywords
    document handling; feature extraction; pattern clustering; density based clustering; document collection; latent semantic feature extraction; latent semantic vector space; multiresolution density measure; multiresolution latent semantic density analysis; unsupervised document clustering; Semantics; density measure; latent semantic mapping; structured document collection; unsupervised clustering; variable resolution;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Machine Learning for Signal Processing (MLSP), 2010 IEEE International Workshop on
  • Conference_Location
    Kittila
  • ISSN
    1551-2541
  • Print_ISBN
    978-1-4244-7875-0
  • Electronic_ISBN
    1551-2541
  • Type

    conf

  • DOI
    10.1109/MLSP.2010.5587982
  • Filename
    5587982