DocumentCode :
3583400
Title :
Entropy-based indexing term for N-gram text search system
Author :
Yamamoto, Hiroshi ; Ohmi, Seishiro ; Tsuji, Hiroshi
Author_Institution :
Software Div., Hitachi Ltd., Osaka, Japan
Volume :
5
fYear :
2003
Firstpage :
4852
Abstract :
N-gram indexing method is an algorithm for the full text search system where each index consists of serial N words or characters. While the system for Japanese text has the 2-gram characters index as base in order to save the volumes of the index file, the additional higher-gram index is expected to improve the performance. This paper presents the entropy-based method for selecting additional higher-gram index. The basic idea comes from the fact that the Katakana words (they have often the same prefix such as "in-" and "ex-" in English) are suitable for the incremental index.
Keywords :
entropy; full-text databases; indexing; query processing; search engines; 2-gram characters index; Japanese text; Katakana words; N-gram text search system; entropy-based indexing term; full text search system; incremental index; index file; Databases; Degradation; Displays; Entropy; Indexing; Search engines;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Systems, Man and Cybernetics, 2003. IEEE International Conference on
ISSN :
1062-922X
Print_ISBN :
0-7803-7952-7
Type :
conf
DOI :
10.1109/ICSMC.2003.1245751
Filename :
1245751
Link To Document :
بازگشت