Description
Chairperson: Martin Schreiber (INRIA)
This short talk provides an introduction to the ongoing research at UGA /TUM (EuroHPC Time-X) on an application-driven dynamic resource approach for HPC. Time-X targets the area of parallel-in-time (PinT) integration, where resource dynamic strategies have been shown to improve the performance and efficiency of PinT algorithms.
However, current approaches to enable dynamic resources for...
The development and evaluation of grid or cluster middlewares, such as batch schedulers, require to deploy numerous machines to reach an environment close to the full scale of the production system.
To avoid these huge deployments, one can consider folding the system on itself by deploying several "virtual" resources onto one physical resource.
In this study, we investigate the...
Process malleability and dynamic resources have demonstrated, in several studies, to increase the productivity of HPC facilities, in terms of completed jobs per unit of time. In this regard, changing the number of resources assigned to an application during its execution accelerates global job processing. Furthermore, the users of malleable applications can also benefit from malleability when...
Large-scale infrastructures are increasingly required to store and retrieve massive amounts of data in order to execute scientific applications at scale. The severe need for I/O performance is now often handled by new intermediate tiers of storage resources, deployed throughout HPC systems (node-local storage, burst-buffers, …) and backed by more and more specialized hardware (NVRAM, NVMe, …)....
The management and allocation of resources to users in HPC infrastructures often relies on the RJMS.
One key component for an optimized resource allocation, with respect to some objectives, is the scheduler.
Scheduling theory is interesting as it provides algorithms with performance guarantees.
These guarantees come at the cost of tedious and complex modeling effort.
The growing...
New memory technologies are an emerging to provide larger RAM sizes at reasonable cost and energy consumption. In addition to the conventional DRAM, recent memory infrastructures contain byte-addressable persistent memory (PMEM) technology that offers capacities higher than DRAM and better access times than Nand-based technologies such as SSDs.
In such hybrid infrastuctures, users have...
We have extended the Ray framework to enable automatic scaling of workloads on high-performance computing (HPC) clusters managed by SLURM© and bursting to a Cloud managed by Kubernetes®. Our implementation allows a single Python-based parallel workload to be run concurrently across an HPC cluster and a Cloud. The Python-level abstraction provided by our solution offers a transparent user...