Stochastic Distributed Learning with Gradient Quantization and Variance Reduction

Abstract

We consider distributed optimization where the objective function is spread among different devices, each sending incremental model updates to a central server. To alleviate the communication bottleneck, recent work proposed various schemes to compress (e.g. quantize or sparsify) the gradients, thereby introducing additional variance hat might slow down convergence. For strongly convex functions distributed among n machines, we give a scheme that converges linearly to a neighborhood of the optimal solution. For objective functions with a finite-sum structure, each worker having less than m components, we present novel variance reduced schemes that converge linearly to arbitrary accuracy. These are the first methods that achieve linear convergence for arbitrary quantized updates. We also give analysis for the weakly convex and non-convex cases and we verify in experiments that our novel variance reduced schemes are more efficient than the baselines.

Publication
Arxiv