Title :
Empirical Study of Chinese Text Similarity Computation Based on Machine Translation
Author :
Xu, Yu ; Liu, Jianxun ; Tang, Mingdong ; Wen, YiPing
Author_Institution :
Dept. of Comput. Sci. & Eng., Hunan Univ. of Sci. & Technol., Xiangtan, China
Abstract :
For the problems of Chinese text similarity calculation based on word frequency statistics, this paper proposed a method by using machine translation to translate Chinese text into English text, indirectly calculate similarity of given texts. This method can avoid some shortcomings of Chinese word segmentation and utilize the advantages of the natural word segmentation of English, and also can use machine translation to indirectly take the semantics of part of words into account. The experiments compared it with the way of directly using Chinese, and a detailed analysis was performed. Experiments show that this method can improve most of social texts´ similarity computation as well as increase the accuracy of the computation as a whole.
Keywords :
language translation; natural language processing; statistical analysis; text analysis; word processing; Chinese text similarity computation; Chinese text translation; Chinese word segmentation; English text; machine translation; natural word segmentation; word frequency statistics; Computational modeling; Computers; Google; Information processing; Information retrieval; Semantics; Vectors; Chinese Word Segmentation; Machine Translation; Text Similarity; Word frequency statistics;
Conference_Titel :
Semantics Knowledge and Grid (SKG), 2011 Seventh International Conference on
Conference_Location :
Beijing
Print_ISBN :
978-1-4577-1323-1
DOI :
10.1109/SKG.2011.19