DocumentCode :
3368345
Title :
Inducing Word Senses for Cross-lingual Document Clustering
Author :
Guoyu Tang ; Yunqing Xia ; Cambria, Erik ; Peng Jin
Author_Institution :
Dept. of Comput. Sci., Tsinghua Univ., Beijing, China
fYear :
2013
fDate :
14-15 Dec. 2013
Firstpage :
409
Lastpage :
414
Abstract :
Cross-lingual document clustering is the task of automatically organizing a large collection of cross-lingual documents into a few groups according to their content or topic. It is well known that language barrier and translation ambiguity are two challenging issues for cross-lingual document representation. To address such issues, we propose to represent cross-lingual documents through statistical word senses, which are learned from a parallel corpus by means of a novel cross-lingual word sense induction model. Furthermore, a sense clustering method is adopted to discover semantic relation of word senses, which are used to represent cross-lingual documents through a sense-based vector space model. Evaluation on a benchmarking dataset shows that the proposed model outperforms two state-of-the-art models in cross-lingual document clustering.
Keywords :
document handling; pattern clustering; cross-lingual document clustering; cross-lingual word sense induction model; sense clustering method; sense-based vector space model; statistical word senses; Adaptation models; Clustering algorithms; Context; Dictionaries; Educational institutions; Semantics; Vectors; Word sense; cross-lingual document clustering; cross-lingual document representation;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computational Intelligence and Security (CIS), 2013 9th International Conference on
Conference_Location :
Leshan
Print_ISBN :
978-1-4799-2548-3
Type :
conf
DOI :
10.1109/CIS.2013.93
Filename :
6746429
Link To Document :
بازگشت