• DocumentCode
    3368345
  • Title

    Inducing Word Senses for Cross-lingual Document Clustering

  • Author

    Guoyu Tang ; Yunqing Xia ; Cambria, Erik ; Peng Jin

  • Author_Institution
    Dept. of Comput. Sci., Tsinghua Univ., Beijing, China
  • fYear
    2013
  • fDate
    14-15 Dec. 2013
  • Firstpage
    409
  • Lastpage
    414
  • Abstract
    Cross-lingual document clustering is the task of automatically organizing a large collection of cross-lingual documents into a few groups according to their content or topic. It is well known that language barrier and translation ambiguity are two challenging issues for cross-lingual document representation. To address such issues, we propose to represent cross-lingual documents through statistical word senses, which are learned from a parallel corpus by means of a novel cross-lingual word sense induction model. Furthermore, a sense clustering method is adopted to discover semantic relation of word senses, which are used to represent cross-lingual documents through a sense-based vector space model. Evaluation on a benchmarking dataset shows that the proposed model outperforms two state-of-the-art models in cross-lingual document clustering.
  • Keywords
    document handling; pattern clustering; cross-lingual document clustering; cross-lingual word sense induction model; sense clustering method; sense-based vector space model; statistical word senses; Adaptation models; Clustering algorithms; Context; Dictionaries; Educational institutions; Semantics; Vectors; Word sense; cross-lingual document clustering; cross-lingual document representation;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computational Intelligence and Security (CIS), 2013 9th International Conference on
  • Conference_Location
    Leshan
  • Print_ISBN
    978-1-4799-2548-3
  • Type

    conf

  • DOI
    10.1109/CIS.2013.93
  • Filename
    6746429