Title of article :
Chinese document indexing based on a new partitioned signature file: Model and evaluation
Author/Authors :
Wai Lam، نويسنده , ,
Kam-Fai Wong، نويسنده , ,
Chi-Yin Wong، نويسنده ,
Issue Information :
ماهنامه با شماره پیاپی سال 2001
Abstract :
In this article we investigate the use of signature files in Chinese information retrieval system and propose a new partitioning method for Chinese signature file based on the characteristic of Chinese words. Our partitioning method, called Partitioned Signature File for Chinese (PSFC), offers faster search efficiency than the traditional single signature file approach. We devise a general scheme for controlling the trade-off between the false drop and storage overhead while maintaining the search space reduction in PSFC. An analytical study is presented to support the claims of our method. We also propose two new hashing methods for Chinese signature files so that the signature file will be more suitable for dynamic environment while the retrieval performance is maintained. Furthermore, we have implemented PSFC and the new hashing methods, and we evaluated them using a large-scale real-world Chinese document corpus, namely, the TREC-5 (Text REtrieval Conference) Chinese collection. The experimental results confirm the features of PSFC and demonstrate its superiority over the traditional single signature file method.
Journal title :
Journal of the American Society for Information Science and Technology
Journal title :
Journal of the American Society for Information Science and Technology