• DocumentCode
    2941322
  • Title

    Random access from compressed datasets with perfect value hashing

  • Author

    Miller, John M.

  • Author_Institution
    Microsoft Corp., Redmond, WA, USA
  • fYear
    1995
  • fDate
    17-22 Sep 1995
  • Firstpage
    454
  • Abstract
    A representation technique is presented allowing for quick access of individual records from a static compressed dataset. Given a collection of key-record pairs, the representation allows the appropriate short record to be returned for any given key. The approach is a generalization of perfect address hashing. The new approach, called perfect value hashing, uses a carefully chosen pseudo-random number generator to directly produce the correct record for any key in the dataset. This contrasts with address hashing where the random number provides an address which is then used to recover the record from a separate table. Value hashing doesn´t have the theoretical limitations of address hashing, and in practice is more space efficient for records of size less than 36 bits. Value hashing has the added benefit (important when the records are encoded for compression) that variable length records can be represented without an increase in the size of the encoded records. This new technique was used to provide random access from a highly compressed spelling dictionary
  • Keywords
    data compression; data structures; encoding; random number generation; random processes; compressed datasets; compressed spelling dictionary; encoded records; individual records access; key-record pairs; perfect address hashing; perfect value hashing; pseudorandom number generator; random access; representation technique; variable length records; Books; Broadcasting; Cost function; Databases; Dictionaries;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Information Theory, 1995. Proceedings., 1995 IEEE International Symposium on
  • Conference_Location
    Whistler, BC
  • Print_ISBN
    0-7803-2453-6
  • Type

    conf

  • DOI
    10.1109/ISIT.1995.550441
  • Filename
    550441