Title :
Coordinating simultaneous caching of file bundles from tertiary storage
Author :
Shoshani, A. ; Sim, A. ; Bernardo, L.M. ; Nordberg, H.
Author_Institution :
Nat. Energy Res. Sci. Comput. Div., Lawrence Berkeley Lab., CA, USA
Abstract :
In a previous paper, we described a system called STAGS (Storage Access Coordination System) for High Energy and Physics (HEP) experiments. These experiments generate very large volumes of “event” data at a very high rate. The volumes of data may reach 100´s of terabytes/year and therefore they are stored on robotic tape systems that are managed by a mass storage system. The data are stored as files on tapes according to a predetermined order, usually according to the order they are generated. A major bottleneck is the retrieval of subsets of these large datasets during the analysis phase. STAGS is designed to optimize the use of a disk cache, and thus minimize the number of files read from tape. In this paper, we describe an interesting problem of disk staging coordination that goes beyond the one-file-at-a-time requirement. The problem stems from the need to coordinate the simultaneous caching of groups of files that we refer to as “bundles of files”. All files from a bundle need to be at the same time in the disk cache in order for the analysis application to proceed. This is a radically different problem from the case where the analysis applications need only one file at a time. In this paper, we describe the method of identifying the file bundles, and the scheduling of bundle caching in such a way that files shared between bundles are not removed from the cache unnecessarily. We describe the methodology and the policies used to determine the order of caching bundles of files, and the order of removing files from the cache when space is needed
Keywords :
cache storage; nuclear electronics; physics computing; scientific information systems; High Energy and Physics experiments; STAGS; bundle caching; disk cache; disk staging coordination; file bundles; large datasets; mass storage system; robotic tape systems; simultaneous caching; storage access coordination system; tertiary storage; Data analysis; Design optimization; Information retrieval; Laboratories; Physics computing; Read only memory; Robot kinematics; Scheduling; Sensor systems; Space technology;
Conference_Titel :
Scientific and Statistical Database Management, 2000. Proceedings. 12th International Conference on
Conference_Location :
Berlin
Print_ISBN :
0-7695-0686-0
DOI :
10.1109/SSDM.2000.869788