• DocumentCode
    3290649
  • Title

    Research on Hybrid Index for Chinese IR

  • Author

    Chen, Chen ; Li, Sheng ; Qi, Haoliang ; Yang, Muyun ; Zhao, Tiejun

  • Author_Institution
    Sch. of Comput. Sci. & Technol., Harbin Inst. of Technol., Harbin
  • Volume
    4
  • fYear
    2008
  • fDate
    18-20 Oct. 2008
  • Firstpage
    606
  • Lastpage
    610
  • Abstract
    It is essential to identify terms that are used as index units in the processing of Chinese documents and queries in IR. In this paper new kinds of hybrid index are proposed, which combine words and bigrams. This kind of hybrid index can reduce the impact of out-of-vocabulary and segmentation ambiguity for Chinese IR, because the dictionary is applied to detect segmentation ambiguities in a flexible way rather than by the ambiguity table rigidly. The experiments show the new kind of hybrid index is not only comparable with bigrams indexing, but also enhances the retrieval efficiency.
  • Keywords
    dictionaries; document handling; indexing; information retrieval; natural language processing; vocabulary; Chinese IR; Chinese documents; bigrams indexing; dictionary; hybrid index; out-of-vocabulary; retrieval efficiency; segmentation ambiguity; Computer science; Dictionaries; Fuzzy systems; Indexing; Information processing; Infrared detectors; Merging; Natural languages; Chinese information retrieval; Hybrid Index;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Fuzzy Systems and Knowledge Discovery, 2008. FSKD '08. Fifth International Conference on
  • Conference_Location
    Jinan Shandong
  • Print_ISBN
    978-0-7695-3305-6
  • Type

    conf

  • DOI
    10.1109/FSKD.2008.146
  • Filename
    4666456