Title
Next-Generation Adaptive Optimization Algorithms for Large-Scale Machine Learning Models
Abstract
Stochastic gradient descent (SGD) is the workhorse for training modern large-scale supervised machine learning models. In this talk, we will discuss recent developments in the convergence analysis of SGD and propose efficient and practical adaptive variants for faster convergence.
We will start by presenting a novel adaptive (no tuning needed) learning rate for SGD. We will introduce a stochastic variant of the classical Polyak step-size (Polyak, 1987) commonly used in the subgradient method and explain why the proposed stochastic Polyak step-size (SPS) is an attractive choice for setting the learning rate for SGD. We will provide theoretical convergence guarantees for the new method in different settings, including strongly convex, convex, and non-convex functions, and demonstrate the strong performance of SGD with SPS compared to state-of-the-art optimization methods when training over-parameterized models. Having presented the SGD with SPS and its practical benefits, we will then focus on two of its main drawbacks: (i) it requires a priori knowledge of the optimal mini-batch losses, which might not always be available (e.g., regularized objectives), and (ii) it guarantees convergence only to a neighborhood of the solution. To resolve these issues, we propose DecSPS, a modification of SPS that guarantees convergence to the exact minimizer without prior knowledge of the problem parameters. For strongly convex optimization problems, DecSPS is the first stochastic adaptive optimization method that converges to the exact solution without restrictive assumptions like bounded iterates/gradients.
Finally, if time permits, we will explore the relationships between SPS variants and momentum. We will investigate the effectiveness of using momentum in a straightforward manner, the potential for an adaptive momentum parameter, and whether there are any theoretical or practical benefits to applying momentum alongside SPS.
Bio
Nicolas Loizou is an Assistant Professor in the Department of Applied Mathematics and Statistics and the Mathematical Institute for Data Science (MINDS) at Johns Hopkins University, where he leads the Optimization and Machine Learning Lab.
Prior to this, he was a Postdoctoral Research Fellow at Mila – Quebec Artificial Intelligence Institute and the Université de Montréal. He completed his Ph.D. studies in Optimization and Operational Research at the University of Edinburgh, School of Mathematics, in 2019. Before that, he received his undergraduate degree in Mathematics from the National and Kapodistrian University of Athens in 2014, and in 2015 obtained his M.Sc. degree in Computing from Imperial College London.
His research interests include large-scale optimization, machine learning, randomized numerical linear algebra, distributed and decentralized algorithms, game theory, and federated learning. His current research focuses on the theory and applications of convex and non-convex optimization in large-scale machine learning and data science problems. He has received several awards and fellowships, including OR Society’s 2019 Doctoral Award (runner-up) for the ”Most Distinguished Body of Research leading to the Award of a Doctorate in the field of Operational Research,” the IVADO Postdoctoral Fellowship, COAP 2020 Best Paper Award, and CISCO 2023 Research Award.