June 2, 2025
Karlsruhe Institute of Technology (KIT)
Europe/Berlin timezone
Registrations closed

Tutorial: Optimal Scaling

This tutorial will focus on optimal scaling in machine learning—a principled approach to training models efficiently and effectively by leveraging scaling laws, optimal hyperparameter transfer, and compute-efficient training design. Participants will be guided through both theoretical foundations and hands-on sessions to understand how to scale model training with the least required compute resources.

The theoretical part of the tutorial will include an overview from recent papers introducing the fundamental methodology and corresponding terminology such as zero-shot hyperparameter transfer, model parametrisation, critical batch size and steepest descend under modular norm:

The hands-on part will introduce participants to the usual workflow of scaling up the model and dataset size, using a simple transformer model on text data to perform HP tuning on the small scale and observe the model dynamics and hyperparameter evolution in the model and data size scaling limits.

Read a detailed description here on GitHub.