Title :
Determining the optimal file size on tertiary storage systems based on the distribution of query sizes
Author :
Bernardo, Luis M. ; Nordberg, Henrik ; Rotem, Doron ; Shoshani, Arie
Author_Institution :
NERSC Div., Lawrence Berkeley Lab., CA, USA
Abstract :
In tertiary storage systems, the data is stored on multiple tape volumes where each tape is further divided into files. Since in many such systems the minimum unit of data transfer is a file, it is an important problem to match file sizes with the access patterns to the data. In general, if the file size is large relative to the query size it will lead to the transfer of large amounts of irrelevant data whereas small file sizes will incur an overhead penalty associated with reading each new file. In this work, we analyze the relationship between file sizes and query response times and provide a methodology to compute the optimal file size given information about the distribution of query sizes. Exact closed form solutions for the cost function are given for two common distributions
Keywords :
magnetic tape storage; physics computing; query processing; scientific information systems; software performance evaluation; very large databases; cost function; data access patterns; data transfer; multiple tape volumes; optimal file size; physics data; query response time; query size distribution; scientific database; tertiary storage systems; Atmospheric modeling; Contracts; Costs; Delay; Hip; Information retrieval; Laboratories; Robots; Satellites; Software systems;
Conference_Titel :
Scientific and Statistical Database Management, 1998. Proceedings. Tenth International Conference on
Conference_Location :
Capri
Print_ISBN :
0-8186-8575-1
DOI :
10.1109/SSDM.1998.688108