DocumentCode
3290649
Title
Research on Hybrid Index for Chinese IR
Author
Chen, Chen ; Li, Sheng ; Qi, Haoliang ; Yang, Muyun ; Zhao, Tiejun
Author_Institution
Sch. of Comput. Sci. & Technol., Harbin Inst. of Technol., Harbin
Volume
4
fYear
2008
fDate
18-20 Oct. 2008
Firstpage
606
Lastpage
610
Abstract
It is essential to identify terms that are used as index units in the processing of Chinese documents and queries in IR. In this paper new kinds of hybrid index are proposed, which combine words and bigrams. This kind of hybrid index can reduce the impact of out-of-vocabulary and segmentation ambiguity for Chinese IR, because the dictionary is applied to detect segmentation ambiguities in a flexible way rather than by the ambiguity table rigidly. The experiments show the new kind of hybrid index is not only comparable with bigrams indexing, but also enhances the retrieval efficiency.
Keywords
dictionaries; document handling; indexing; information retrieval; natural language processing; vocabulary; Chinese IR; Chinese documents; bigrams indexing; dictionary; hybrid index; out-of-vocabulary; retrieval efficiency; segmentation ambiguity; Computer science; Dictionaries; Fuzzy systems; Indexing; Information processing; Infrared detectors; Merging; Natural languages; Chinese information retrieval; Hybrid Index;
fLanguage
English
Publisher
ieee
Conference_Titel
Fuzzy Systems and Knowledge Discovery, 2008. FSKD '08. Fifth International Conference on
Conference_Location
Jinan Shandong
Print_ISBN
978-0-7695-3305-6
Type
conf
DOI
10.1109/FSKD.2008.146
Filename
4666456
Link To Document