Title

Optimization for Deep Learning

Abstract

The field of optimization for machine learning has undergone significant changes in recent years with deep learning models increasing in scale and fine-tuning taking a more prominent role. In this presentation, I will share a perspective on the direction of changes in the field and highlight interesting research directions. I will provide real-world examples of what practitioners want from optimization methods to train deep networks at scale. I will then present my recent work on adaptive methods, such as Adam and Adagrad, and explain how we can estimate the learning rate for these methods using theoretical tools from convex deterministic optimization and provide their convergence guarantees. Finally, I will share some thoughts on the limitations of the existing theoretical frameworks and talk about potential ways to bridge the gap between theory and practice.

Bio

Konstantin Mishchenko is a Research Scientist at Samsung in Cambridge, UK. Before joining Samsung, he was a postdoc in the group of Francis Bach at Inria Paris, and he did his PhD at KAUST under the supervision of Peter Richtárik. He has worked on distributed and stochastic optimization theory, second-order methods, federated learning, and on-device deep learning inference. His work on adaptive methods was awarded the Outstanding Paper Award from the ICML conference in 2023.