September 04, 2018
EULER, room a.207
Joint CORE-INMA Seminar
Communication Efficient Variants of SGD for Distributed Computing
Sebastian Stich, EPFL
Nowadays machine learning applications require stochastic optimization algorithms that can be implemented on distributed systems. The communication overhead of the algorithms is a key bottleneck that hinders perfect scalability. In this talk we will discuss two techniques that aim reducing the communication costs.
First, we discuss quantization and sparsification techniques that reduce the amount of data that needs to be communicated. We present a variant of SGD with k-sparsification (for instance top-k or random-k) and show that this scheme converges at the same rate as vanilla SGD. That is, the communication can be reduced by a factor of the dimension of the whilst still converging at the same rate.
In the second (and shorter) half of the talk we discuss strategies that tackle the communication frequency instead of the communicated data. In particular, we compare local SGD (independent runs of SGD in parallel) with mini batch SGD.