DocumentCode
2414733
Title
Distributed data access in the Sequential Access Model at the D0 experiment at Fermilab
Author
Terekhov, Igor ; White, Victoria
Author_Institution
Fermi Nat. Accel. Lab., Batavia, IL, USA
fYear
2000
fDate
2000
Firstpage
310
Lastpage
311
Abstract
Presents the Sequential Access Model (SAM), which is the data-handling system for D0, one of two primary high-energy experiments at Fermilab. During the next several years, the D0 experiment will store a total of about 1 PByte of data, including raw detector data and data processed at various levels. The design of SAM is not specific to the D0 experiment and carries few assumptions about the underlying mass storage level; its ideas are applicable to any sequential data access. By definition, in the sequential access mode, a user application needs to process a stream of data by accessing each data unit exactly once, the order of the data units in the stream being irrelevant. The units of data are laid out sequentially in files. The adopted model allows for a significant optimization of system performance, a reduction in user file latency and an increase in the overall throughput. In particular, caching is done with the knowledge of all the files that are needed “in the near future”, which is defined as all the files being used by already-running or submitted jobs. The bulk of the data is stored in files on tape in the mass storage system Enstore. All of the data managed by SAM is cataloged in great detail in a relational database (Oracle)
Keywords
cache storage; data acquisition; data handling; distributed databases; high energy physics instrumentation computing; magnetic tape storage; relational databases; 1 PByte; Enstore mass storage system; Fermi National Accelerator Laboratory; Fermilab D0 experiment; Oracle relational database; Sequential Access Model; caching; data cataloguing; data files; data handling system; data stream; data units; distributed data access; high-energy physics experiment; magnetic tape storage; mass storage; processed data; raw detector data; running jobs; sequential data access; submitted jobs; system performance optimization; throughput; user file latency; Data handling; Delay; Information retrieval; Laboratories; Libraries; Relational databases; Samarium; Storage automation; System performance; Throughput;
fLanguage
English
Publisher
ieee
Conference_Titel
High-Performance Distributed Computing, 2000. Proceedings. The Ninth International Symposium on
Conference_Location
Pittsburgh, PA
ISSN
1082-8907
Print_ISBN
0-7695-0783-2
Type
conf
DOI
10.1109/HPDC.2000.868672
Filename
868672
Link To Document