1st MALA hackathon - Improving MALA production performance

Name: 1st MALA hackathon - Improving MALA production performance
Start: 2023-01-09T09:00:00+01:00
End: 2023-01-10T17:30:00+01:00
Location: Untermarkt 19

9 Jan 2023, 09:00 → 10 Jan 2023, 17:30 Europe/Berlin

Untermarkt 19

CASUS

Attila Cangi (Center for Advanced Systems Understanding, HZDR), Lenz Fiedler (HZDR), Mani Lokamani (HZDR)

Description

The main idea behind this hackathon is to focus on improving the performance of MALA. This especially applies to inference, but also training of models. Of course, other topics may just as well be touched.

Possible projects

1. Total energy module GPU version

- GOAL: Enable GPU based calculations using the total energy module (based on Quantum ESPRESSO).
- LINKED ISSUE: https://github.com/mala-project/mala/issues/362
- STEPS:
1. Build a GPU version of QE.
2. Port external_modules/total_energy_module/total_energy.f90 to a GPU ready version of QE.
3. Test that this works as expected (e.g. with examples/ex03_postprocess_data.py)
4. Test whether the functions we lose time in are already GPU ready; they may not be, since we run at different scales then QE regularly does.

2. Descriptor calculation GPU version

- GOAL: Enable GPU based calculations using the LAMMPS code.
- LINKED ISSUE: https://github.com/mala-project/mala/issues/399
- STEPS:
1. Build a GPU version of LAMMPS.
2. If necessary, port our version of LAMMPS to this one
3. Benchmark the calculations and check whether our code already makes good use of the GPU

3. Full MALA benchmark

- GOAL: Find out where MALA loses time during training and inference.
- LINKED ISSUE: None
- STEPS:
1. Build a standard MALA version on hemera, if none is available.
2. Run a simple training (e.g. Al256 at room temperature) and thereafter an inference, first in GPU, second in CPU
3. Time all things important.

4. Improve lazy loading time

- GOAL: This topic is directed towards Jon/Sandia, since I know they have been doing some things in this regard in cooperation with Nvidia. The idea is to reduce the data loading time for the lazy loading case.
- LINKED ISSUE: None.
- STEPS:
1. Get in touch with Jon or be Jon
2. Merge required changes to main code

5. Allow for different grid sizes in snapshots

- GOAL: MALA currently requires all snapshots to have the same grid size. This is obviously an unnecessary requirement.
- LINKED ISSUE: None, but there is an existing branch: https://github.com/RandomDefaultUser/mala/commits/flexible_datahandling, however I would NOT go beyond this commit: https://github.com/RandomDefaultUser/mala/commit/60e14a82bb6601b7ac2078f2886b439abaf01d95, the rest is experimental
- STEPS:
1. Rebase/Update branch, starting from the last reliable commit
2. Test that everything works

6. OpenPMD parallelization

- GOAL: Make use of OpenPMD parallelization capabilities
- LINKED ISSUE: None.
- STEPS:
1. Investigate OpenPMD parallelization for data writing.
2. Investigate OpenPMD parallelization for data reading.

7. Convenience functions

- GOAL: MALA has been used for a while now, and some convenience use cases have emerged.
- LINKED ISSUE: https://github.com/mala-project/mala/issues/378, https://github.com/mala-project/mala/issues/365, https://github.com/mala-project/mala/issues/295
- STEPS:
1. Look into the issues.

8. Scheduling

- GOAL: MALA currently uses x GPUs for training and y CPUs for preprocessing, where currently x=y. Investigate whether y>x.
- LINKED ISSUE: None.
- STEPS:
1. Implement an MPI based interface to exchange data between ranks.
2. Benchmark.

9. ML topics

- While the focus of the sessions should be performance, ML topics such as Gaussian processes, equivariant NN, etc. can of course also be investigated.

Mr.

m.lokamani@hzdr.de

a.cangi@hzdr.de

l.fiedler@hzdr.de

+49 351 260 3661

+49 3581 37523 55

Registration

HPC on HEMERA/TAURUS

Surveys

Feedback for 1st MALA hackathon

Monday 9 January
- Mon 9 Jan
- Tue 10 Jan
- 09:00 → 10:00
  
  Welcome address
  
  Get-in, introduction to projects, assignment of projects
  
  Conveners: Dr Attila Cangi (Center for Advanced Systems Understanding, HZDR), Lenz Fiedler (HZDR), Mani Lokamani (HZDR)
- 10:00 → 12:00
  
  Hacking Session: #1
  
  Convener: Lenz Fiedler (HZDR)
- 12:00 → 13:00
  
  Lunch 1h
- 13:00 → 15:00
  
  Hacking Session: #2
  
  Convener: Lenz Fiedler (HZDR)
- 15:00 → 15:30
  
  Coffee Break 30m
- 15:30 → 17:00
  
  Hacking Session: #3
  
  Convener: Lenz Fiedler (HZDR)
- 17:00 → 17:30
  
  Q/A: Discussion of progress
  
  Convener: Lenz Fiedler (HZDR)
Tuesday 10 January
- Mon 9 Jan
- Tue 10 Jan
- 09:00 → 12:00
  
  Hacking Session: #4
  
  Convener: Lenz Fiedler (HZDR)
- 12:00 → 13:00
  
  Lunch 1h
- 13:00 → 16:00
  
  Hacking Session: #5
  
  Convener: Lenz Fiedler (HZDR)
- 16:00 → 16:30
  
  Coffee break 30m
- 16:30 → 17:30
  
  Q/A: Wrap up, assignment of PRs, future projects, feedback
  
  Convener: Lenz Fiedler (HZDR)

Choose timezone