DocumentCode :
2963764
Title :
Determining the optimal file size on tertiary storage systems based on the distribution of query sizes
Author :
Bernardo, Luis M. ; Nordberg, Henrik ; Rotem, Doron ; Shoshani, Arie
Author_Institution :
NERSC Div., Lawrence Berkeley Lab., CA, USA
fYear :
1998
fDate :
1-3 Jul 1998
Firstpage :
22
Lastpage :
31
Abstract :
In tertiary storage systems, the data is stored on multiple tape volumes where each tape is further divided into files. Since in many such systems the minimum unit of data transfer is a file, it is an important problem to match file sizes with the access patterns to the data. In general, if the file size is large relative to the query size it will lead to the transfer of large amounts of irrelevant data whereas small file sizes will incur an overhead penalty associated with reading each new file. In this work, we analyze the relationship between file sizes and query response times and provide a methodology to compute the optimal file size given information about the distribution of query sizes. Exact closed form solutions for the cost function are given for two common distributions
Keywords :
magnetic tape storage; physics computing; query processing; scientific information systems; software performance evaluation; very large databases; cost function; data access patterns; data transfer; multiple tape volumes; optimal file size; physics data; query response time; query size distribution; scientific database; tertiary storage systems; Atmospheric modeling; Contracts; Costs; Delay; Hip; Information retrieval; Laboratories; Robots; Satellites; Software systems;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Scientific and Statistical Database Management, 1998. Proceedings. Tenth International Conference on
Conference_Location :
Capri
ISSN :
1099-3371
Print_ISBN :
0-8186-8575-1
Type :
conf
DOI :
10.1109/SSDM.1998.688108
Filename :
688108
Link To Document :
بازگشت