This tutorial teaches users how to identify and optimize performance-critical sections of their Python code. Participants will learn to recognize bottlenecks and improve execution speed using just-in-time compilation with Numba. The tutorial also explores other optimization frameworks such as Cython and PyPy, highlighting their strengths and appropriate use cases. Users will gain hands-on experience transforming slow code into efficient implementations with minimal changes. In addition to CPU optimization, the session introduces writing and launching custom CUDA kernels in Python using Numba’s GPU support. By the end, users will be equipped to accelerate their code on both CPUs and GPUs.
Read a detailed description here on GitHub.