Speaker
Description
Process malleability and dynamic resources have demonstrated, in several studies, to increase the productivity of HPC facilities, in terms of completed jobs per unit of time. In this regard, changing the number of resources assigned to an application during its execution accelerates global job processing. Furthermore, the users of malleable applications can also benefit from malleability when they are expected to execute large workloads since they will get their results faster.
Nevertheless, malleable applications are rather unusual, and commonly, they do not take part in production workloads. This side effect of malleability is mainly due to the difficulty of adopting malleability in already existent scientific applications, since the state-of-the-art solutions report complex APIs or even, a change of programming model.
In this work, we present the dynamic management of resources library (DMRlib), a malleability solution that poses to users a simple MPI-like syntax and provides support for job reconfiguration, data redistribution, process management, execution resuming, and dynamic resources.