Title :
Challenges in Chinese Text Similarity Research
Author :
Wang, Xiuhong ; Ju, Shiguang ; Wu, Shengli
Author_Institution :
Inst. of Sci. & Technol. Inf., Jiangsu Univ., Zhenjiang
Abstract :
There are many opportunities and challenges in Chinese text similarity research, which is one of the most important issues in the information retrieval field. Quite a few models and approaches have been investigated for this. Chinese is one of the most complicated languages on morphology, syntax, semantics and pragmatics. In Chinese, there is not an explicit delimiter between words as in English. The difficulties in Chinese natural language processing, such as segmentation, knock down both effectiveness and efficiency of text similarity computation. This paper addresses some challenges in Chinese text similarity computation, which are undergoing from Chinese linguistics, models and approaches used in information retrieval. We consider Chinese text similarity computing tasks to cover broad topics of word, sentence and document similarity. Our work provides insights into the difficulties and bottleneck in the research, including tradeoffs between effectiveness and efficiency. New directions of the future work are discussed.
Keywords :
computational linguistics; information retrieval; natural language processing; text analysis; Chinese linguistics; Chinese text similarity computation; document similarity; information retrieval; natural language processing; sentence similarity; text segmentation; word similarity; Character recognition; Computer science; Data mining; Information processing; Information retrieval; Mathematics; Morphology; Natural language processing; Natural languages; Telecommunication computing; Chinese text similarity; algorithm; challenges;
Conference_Titel :
Information Processing (ISIP), 2008 International Symposiums on
Conference_Location :
Moscow
Print_ISBN :
978-0-7695-3151-9
DOI :
10.1109/ISIP.2008.76