Title :
Inducing Word Senses for Cross-lingual Document Clustering
Author :
Guoyu Tang ; Yunqing Xia ; Cambria, Erik ; Peng Jin
Author_Institution :
Dept. of Comput. Sci., Tsinghua Univ., Beijing, China
Abstract :
Cross-lingual document clustering is the task of automatically organizing a large collection of cross-lingual documents into a few groups according to their content or topic. It is well known that language barrier and translation ambiguity are two challenging issues for cross-lingual document representation. To address such issues, we propose to represent cross-lingual documents through statistical word senses, which are learned from a parallel corpus by means of a novel cross-lingual word sense induction model. Furthermore, a sense clustering method is adopted to discover semantic relation of word senses, which are used to represent cross-lingual documents through a sense-based vector space model. Evaluation on a benchmarking dataset shows that the proposed model outperforms two state-of-the-art models in cross-lingual document clustering.
Keywords :
document handling; pattern clustering; cross-lingual document clustering; cross-lingual word sense induction model; sense clustering method; sense-based vector space model; statistical word senses; Adaptation models; Clustering algorithms; Context; Dictionaries; Educational institutions; Semantics; Vectors; Word sense; cross-lingual document clustering; cross-lingual document representation;
Conference_Titel :
Computational Intelligence and Security (CIS), 2013 9th International Conference on
Conference_Location :
Leshan
Print_ISBN :
978-1-4799-2548-3
DOI :
10.1109/CIS.2013.93