Abstract :
To sustain emerging data-intensive scientific applications, High Performance Computing (HPC) centers invest a notable fraction of their operating budget on a specialized fast storage system, scratch space, which is designed for storing the data of currently running and soon-to-run HPC jobs. Instead, it is often used as a standard file system, wherein users arbitrarily store their data, without any consideration to the center´s overall performance. To remedy this, centers periodically scan the scratch in an attempt to purge transient and stale data. This practice of supporting a cache workload using a file system and disjoint tools for staging and purging results in sub optimal use of the scratch space. This work addresses the above issues by proposing a new perspective, where the HPC scratch space is treated as a cache, and data population, retention, and eviction tools are integrated with scratch management. Using this approach, data is moved to the scratch space only when it is needed, and unneeded data is removed as soon as possible.
Keywords :
cache storage; data handling; scientific information systems; HPC center; HPC job; cache workload; data population; data retention; data storing; data-intensive scientific application; file system; high performance computing; integrated scratch management service; operating budget; scratch space; storage system; Availability; Bandwidth; Cloud computing; Fuses; Proposals; Transient analysis;