Speaker
Description
Large-scale infrastructures are increasingly required to store and retrieve massive amounts of data in order to execute scientific applications at scale. The severe need for I/O performance is now often handled by new intermediate tiers of storage resources, deployed throughout HPC systems (node-local storage, burst-buffers, …) and backed by more and more specialized hardware (NVRAM, NVMe, …). Unfortunately, these costly resources are vastly heterogeneous and require advanced techniques to be correctly allocated and sized, otherwise risking to be underutilized. In an effort to help mitigate such issues, we recently presented StorAlloc, a simulator used as a testbed for assessing storage-aware job scheduling algorithms and evaluating various storage infrastructures. Achieving the main goal behind StorAlloc – allocating HPC storage in a similar way as compute resources – now requires to extend on this initial work. To do so, we turn to state of the art simulation frameworks such as WRENCH and Simgrid to further develop the ideas presented in StorAlloc.