Abstract :
The efficient organization of a very large file to facilitate search and retrieval operations is an important but very complex problem. In this paper we consider the case of a large file in which the frequency of use of its component subfiles are known. We develop the organization of the file so that the average number of entries to locate individual items in it by means of binary search is minimized. The algorithm iteratively partitions the file into "saturated" subfiles, and with each successive iteration the average number of entries to locate an item is reduced until no more improvement is possible. Next, we extend the method to solve the realistic problem of designing an optimal memory hierarchy to hold the file in a computer system. The sizes of various memory components and location of various items of the frequency-dependent file are determined so that the average time to locate an item (over the totality of items) in the memory hierarchy is minimized for a given total cost of the memory system. A number of examples are given to elucidate the methods. Also, the characteristics and results of a Fortran implementation of the algorithms on the CDC 6600 are described.
Keywords :
Access time, binary search, cost of memory type, file, frequency of usage, frequency partition file, item, key, mean frequency, memory hierarchy, saturated file.; Costs; Dictionaries; Frequency measurement; Information retrieval; Iterative algorithms; Natural languages; Partitioning algorithms; Time measurement; Access time, binary search, cost of memory type, file, frequency of usage, frequency partition file, item, key, mean frequency, memory hierarchy, saturated file.;