• DocumentCode
    2335332
  • Title

    Document clustering and cluster topic extraction in multilingual corpora

  • Author

    Silva, Joaquim ; Mexia, João ; Coelho, Agra ; Lopes, Gabriel

  • Author_Institution
    Univ. Nova de Lisboa, Lisbon, Portugal
  • fYear
    2001
  • fDate
    2001
  • Firstpage
    513
  • Lastpage
    520
  • Abstract
    A statistics-based approach for clustering documents and for extracting cluster topics is described relevant (meaningful) expressions (REs) automatically extracted from corpora are used as clustering base features. These features are transformed and its number is strongly reduced in order to obtain a small set of document classification features. This is achieved on the basis of principal components analysis. Model-based clustering analysis finds the best number of clusters. Then, the most important REs are extracted from each cluster and taken as document cluster topics
  • Keywords
    data mining; document handling; pattern clustering; cluster topic extraction; document classification features; document clustering; model-based clustering analysis; multilingual corpora; principal components analysis; relevant expressions; statistics-based approach; Agriculture; Data mining; Dispersion; Feature extraction; Instruction sets; Organizing; Probability; Size measurement;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Mining, 2001. ICDM 2001, Proceedings IEEE International Conference on
  • Conference_Location
    San Jose, CA
  • Print_ISBN
    0-7695-1119-8
  • Type

    conf

  • DOI
    10.1109/ICDM.2001.989559
  • Filename
    989559