Description
Session Chairperson: Lea Saßmannshausen, JSC/UzK
In situ paradigm represents a relevant alternative to classical post hoc workflows as it allows bypassing disk accesses, thus reducing the IO bottleneck. However, as most in situ data analytics tools are built on MPI, and they are complicated to set up and use, especially to parallelize irregular algorithms. In a previous work, we provided a tool that couples MPI simulations with in situ...
The use of accelerators such as GPUs has become mainstream to
achieve high performance on modern computing systems. GPUs come with
their own (limited) memory and are connected to the main memory of
the machine through a bus (with limited bandwidth). When a
computation is started on a GPU, the corresponding data needs to
be transferred to the GPU before the computation starts....
In the process of modularizing the molecular dynamics simulation library FMSolvr of JSC, we do not only want to improve our numerical methods but also take a look at communication and whether we can achieve an improvement on that front. We conducted a systematic literature review to see what has already been done in this field and picked out two promising communication schemes for further...
Applications traditionally leverage MPI to run efficiently on HPC systems and scale up to thousands of processors. Since one decade ago, developers have also been adapting their applications to heterogeneous systems by offloading the most time-consuming computation kernels to the available GPUs. To achieve optimal performance in such applications, developers must use the non-blocking and...
Task-based systems have become popular due to their ability to utilize the computational power of complex heterogeneous systems. A typical programming model used is the Sequential Task Flow (STF) model, which unfortunately only supports static task graphs. This can result in submission overhead and a task graph that is not well-suited for execution on heterogeneous systems. A common approach...
This talk will focus on the design of device support in the Template Task Graph. Specifically, TTG employs C++ coroutines to suspend tasks during times of data motion and kernel execution. This design allows TTG to support transparent device memory oversubscription by delegating memory management to the underlying PaRSEC runtime system. TTG will also offer coroutines as a means for describing...
Task-based programming models have proven to be a robust and versatile way to approach development of applications for distributed environments. The programming model itself feels natural and close to classical algorithms; the task-based distribution of tasks can achieve a high degree of performance. All this is achieved with a minimal impact on programmability. However, execution on this...