Random access from compressed datasets with perfect value hashing

Author

Miller, John M.

Author_Institution

Microsoft Corp., Redmond, WA, USA

fYear

1995

fDate

17-22 Sep 1995

Firstpage

454

Abstract

A representation technique is presented allowing for quick access of individual records from a static compressed dataset. Given a collection of key-record pairs, the representation allows the appropriate short record to be returned for any given key. The approach is a generalization of perfect address hashing. The new approach, called perfect value hashing, uses a carefully chosen pseudo-random number generator to directly produce the correct record for any key in the dataset. This contrasts with address hashing where the random number provides an address which is then used to recover the record from a separate table. Value hashing doesn´t have the theoretical limitations of address hashing, and in practice is more space efficient for records of size less than 36 bits. Value hashing has the added benefit (important when the records are encoded for compression) that variable length records can be represented without an increase in the size of the encoded records. This new technique was used to provide random access from a highly compressed spelling dictionary

Keywords

data compression; data structures; encoding; random number generation; random processes; compressed datasets; compressed spelling dictionary; encoded records; individual records access; key-record pairs; perfect address hashing; perfect value hashing; pseudorandom number generator; random access; representation technique; variable length records; Books; Broadcasting; Cost function; Databases; Dictionaries;

fLanguage

English

Publisher

ieee

Conference_Titel

Information Theory, 1995. Proceedings., 1995 IEEE International Symposium on

Conference_Location

Whistler, BC

Print_ISBN

0-7803-2453-6

Type

conf

DOI

10.1109/ISIT.1995.550441

Filename

550441