New paper out
Our new paper Adaptivity of Stochastic Gradient Methods for Nonconvex Optimization is available on the arXiv. This project originated in May 2019 while I was visiting prof. Michael I. Jordan at Berkeley and is the joint work with Lihua Lei, that time PhD student at Berkeley, and my supervisor Peter Richtarik.
Abstract:
Adaptivity is an important yet under-studied property in modern optimization theory. The gap between the state-of-the-art theory and the current practice is striking in that algorithms with desirable theoretical guarantees typically involve drastically different settings of hyperparameters, such as step-size schemes and batch sizes, in different regimes. Despite the appealing theoretical results, such divisive strategies provide little, if any, insight to practitioners to select algorithms that work broadly without tweaking the hyperparameters. In this work, blending the “geometrization” technique introduced by Lei & Jordan, 2016, and the SARAH algorithm of Nguyen et al., we propose the Geometrized SARAH algorithm for non-convex finite-sum and stochastic optimization. Our algorithm is proved to achieve adaptivity to both the magnitude of the target accuracy and the Polyak-Lojasiewicz (PL) constant, if present. In addition, it achieves the best-available convergence rate for non-PL objectives simultaneously while outperforming existing algorithms for PL objectives.