• DocumentCode
    793295
  • Title

    Efficient signature file methods for text retrieval

  • Author

    Lee, Dik Lun ; Kim, Young Man ; Patel, Gaurav

  • Author_Institution
    Dept. of Comput. & Inf. Sci., Ohio State Univ., Columbus, OH, USA
  • Volume
    7
  • Issue
    3
  • fYear
    1995
  • fDate
    6/1/1995 12:00:00 AM
  • Firstpage
    423
  • Lastpage
    435
  • Abstract
    Signature files have been studied extensively, as an access method for textual databases. Many approaches have been proposed for searching signatures files efficiently. However, different methods make different assumptions and use different performance measures, making it difficult to compare their performance. In this paper, we study three basic methods proposed in the literature, namely, the indexed descriptor file, the two-level superimposed coding scheme, and the partitioned signature file approach. The contribution of this paper is two-fold. First, we present a uniform analytical performance model so that the methods can be compared fairly and consistently. The analysis shows that the two-level superimposed coding scheme, if stored in a transposed file, has the best performance. Second, we extend the two-level superimposed coding method into a multilevel superimposed coding method, we obtain the optimal number of levels for the multilevel method and show that for databases with reasonable size the optimal value is much larger than 2, which is assumed in the two-level method. The accuracy of the analytical formula is demonstrated by simulation
  • Keywords
    information retrieval; access method; indexed descriptor file; partitioned signature file approach; performance measures; signature file methods; simulation; text retrieval; textual databases; two-level superimposed coding scheme; Analytical models; Chemicals; Cities and towns; Computer Society; DNA; Hardware; Multimedia databases; Performance analysis; Performance evaluation; Search methods;
  • fLanguage
    English
  • Journal_Title
    Knowledge and Data Engineering, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1041-4347
  • Type

    jour

  • DOI
    10.1109/69.390248
  • Filename
    390248