1

Hyperparameter Transfer Learning with Adaptive Complexity

Bayesian optimization (BO) is a sample efficient approach to automatically tune the hyperparameters of machine learning models. In practice, one frequently has to solve similar hyperparameter tuning problems sequentially. For example, one might have …

Optimal Client Sampling for Federated Learning

It is well understood that client-master communication can be a primary bottleneck in Federated Learning. In this work, we address this issue with a novel client subsampling scheme, where we restrict the number of clients allowed to communicate …

Lower Bounds and Optimal Algorithms for Personalized Federated Learning

In this work, we consider the optimization formulation of personalized federated learning recently introduced by Hanzely & Richtarik (2020) which was shown to give an alternative explanation to the workings of local SGD methods. Our first …

A Better Alternative to Error Feedback for Communication-Efficient Distributed Learning

Modern large-scale machine learning applications require stochastic optimization algorithms to be implemented on distributed compute systems. A key bottleneck of such systems is the communication overhead for exchanging information across the …

On Biased Compression for Distributed Learning

In the last few years, various communication compression techniques have emerged as an indispensable tool helping to alleviate the communication bottleneck in distributed learning. However, despite the fact biased compressors often show superior …

Adaptivity of Stochastic Gradient Methods for Nonconvex Optimization

Adaptivity is an important yet under-studied property in modern optimization theory. The gap between the state-of-the-art theory and the current practice is striking in that algorithms with desirable theoretical guarantees typically involve …

Nonconvex Variance Reduced Optimization with Arbitrary Sampling

We provide the first importance sampling variants of variance-reduced algorithms for empirical risk minimization with non-convex loss functions. In particular, we analyze non-convex versions of SVRG, SAGA and SARAH. Our methods have the capacity to …

Don’t Jump Through Hoops and Remove Those Loops: SVRG and Katyusha are Better Without the Outer Loop

The stochastic variance-reduced gradient method (*SVRG*) and its accelerated variant (*Katyusha*) have attracted enormous attention in the machine learning community in the last few years due to their superior theoretical properties and empirical …