June 2, 2025
Karlsruhe Institute of Technology (KIT)
Europe/Berlin timezone
Registrations closed

Tutorial: Accelerating massive data processing in Python: Scaling, Accelerating and Profiling - a Heat and perun tutorial

Many data processing workflows in science and technology build on Python libraries like NumPy, SciPy, scikit learn etc., that are easy-to-learn and easy-to-use. In addition, these libraries are based on highly optimized computational backends and thus allow to achieve quite a competitive performance --- at least as long as no GPU-acceleration is taken into account and as long as the memory of a single workstation/cluster-node is sufficient for all required tasks.

However, in the presence of steadily growing data sets, the limitation to the RAM of a single machine may pose a severe obstacle. Nevertheless, the step from a workstation to a (GPU-)cluster can be challenging for domain experts without prior HPC-experience.

This group of users is targeted by our Python library Heat ("Helmholtz Analytics Toolkit") to which we want to give a brief hands-on introduction in this tutorial. Our library builds on PyTorch and mpi4py and simplifies porting of NumPy/SciPy-based code to GPU (CUDA, ROCm), including multi-GPU, multi-node clusters. On the surface, Heat implements a NumPy-like API, is largely interoperable with the Python array ecosystem, and can be employed seamlessly as a backend to accelerate existing single-CPU pipelines, as well as to develop new HPC-applications from scratch. Under the hood, Heat distributes memory-intensive operations and algorithms via MPI-communication and thus avoids some of the overhead that is often introduced by different, task-parallelism-based libraries for scaling NumPy/SciPy/scikit-learn applications.

Additionally, we will demonstrate the usage of perun, a powerful profiling and benchmarking tool designed specifically for Python users working with HPC systems. Perun is particularly valuable for analyzing the power consumption of distributed applications, helping users optimize their code for both performance and energy efficiency.

In this tutorial, you will get an overview of:

  • Heats basics: getting started with distributed I/O, data decomposition scheme, array operations
  • Existing functionalities: multi-node linear algebra, statistics, signal processing, machine learning…
  • DIY how-to: using existing Heat infrastructure to build your own multi-node, multi-GPU research software
  • Application profiling: using perun to analyze performance and power consumption of distributed applications

We will also touch upon Heats and peruns implementation roadmap, and possible paths to collaboration.

Read a detailed description here on GitHub.