Description
Mode connectivity refers to a phenomenon, when stochastic gradient descent (SGD) solutions in a neural network parameter space are connected by a path of low loss [1]. This means that each solution (network) along this path exhibits similar performance and generalization ability as the “end” solutions, between which the path is constructed. Linear mode connectivity (LMC) is a special case, in which solutions are connected via linear paths of non-increasing loss [2]. LMC is said to improve ensemble methods, in particular in federated learning settings, robustness of fine-tuned models, distributed optimization and model pruning. This study focuses on the experimental investigation of conditions, under which LMC can be destroyed. We show that the following factors have an effect on stability of SGD training dynamics: data shifts, network parameterization, and training procedure (regularization, size of training mini-batch). We finally look into the question of the benefits of LMC for ensemble methods by considering whether models, sampled from the same basin (low-loss region) perform the same mistakes more often than models, sampled from different minima of loss landscape.
[1] Loss Surfaces, Mode Connectivity, and Fast Ensembling of DNNs, T. Garipov et al., 2018
[2] Linear Mode Connectivity and the Lottery Ticket Hypothesis J. Frankle et al., 2020