15th JLESC Workshop

Name: 15th JLESC Workshop
Start: 2023-03-21T12:00:00+01:00
End: 2023-03-23T22:00:00+01:00
Location: LaBRI

21–23 Mar 2023

LaBRI

Europe/Paris timezone

Contact

Running Native HPC Applications on the Cloud

22 Mar 2023, 11:20

10m

LaBRI Amphi (LaBRI)

LaBRI Amphi

LaBRI

Short talk Programming languages and runtimes Short Talks on Applications

Dr Volodymyr Kindratenko (University of Illinois at Urbana-Champaign)

In this project, we aim to enable Charm++ based HPC applications to run natively on a Kubernetes cloud platform. The Charm++ programming model provides a shrink/expand capability which matches well with the elastic cloud philosophy. We investigate how to enable running Charm++ applications with dynamic scaling of resources on Kubernetes. In order to run Charm++ applications on Kubernetes, we have implemented a Charm operator, very similar to Kubeflow’s mpi-operator. The charm operator enables scaling of the number of pods in a job which isn’t supported by the mpi operator since typically MPI applications do not support rescaling of resources at runtime. This operator also generates the nodelist in the correct format required by Charm++ programs for rescaling. The Charm++ application is launched in server mode to enable the injection of messages into the scheduler externally which is used to signal rescaling. The Charm operator handles allocation of resources and cleanup for all charm jobs on the Kubernetes cluster. For startup, it creates the launcher and worker pods for all jobs and performs monitoring for any change to a deployment configuration. We are implementing changes in the controller code which allow scaling of pods, i.e. shrinking or expanding the number of pods allocated to a Charm++ job. Currently, we have added support for making shrink/expand updates using the YAML file for the deployment. We use these shrink/expand updates to yaml script for testing our implementation. We are working on two modes for scaling, one where the pods are deleted on shrink and for expand new pods are created. In the second mode, we maintain a pool of worker pods where shrink releases worker pods to the pool of pods and these can be re-used for an expand request by another job in the context of the charm-operator.

JLESC topic	HPC+Cloud

Mr Aditya Bhosale (University of Illinois at Urbana-Champaign) Ms Kavitha Chandrasekar (University of Illinois at Urbana-Champaign) Dr Pedro Bello-Maldonado (IBM) Dr Carlos Costa (IBM) Dr Claudia Misale (IBM) Sara Kokkila-Schumacher (IBM) Prof. Laxmikant Kale (University of Illinois at Urbana-Champaign) Dr Volodymyr Kindratenko (University of Illinois at Urbana-Champaign)

There are no materials yet.

15th JLESC Workshop

Contact

Running Native HPC Applications on the Cloud

LaBRI Amphi

LaBRI

Speaker

Description

Primary authors

Presentation materials

Choose timezone

15th JLESC Workshop

Contact

Speaker

Description

Primary authors

Presentation materials