Fast asynchronous parallel optimization

Context

Machine learning tasks very often involve optimizing finite sums of loss functions. In order to handle the now massive amounts of data current problems generate, we need to define new fast and efficient optimization algorithms, which can leverage the common parallel architecture of modern computer architectures. In order to maximize efficiency, we also need robust theoretical analysis of these algorithms, which give us insights on how to correctly use parallel resources depending on the dataset and associated model.
Our code is available on github here and here.

People

Papers

Breaking the nonsmooth barrier: a scalable parallel method for composite optimization.
Fabian Pedregosa, Rémi Leblond and Simon Lacoste-Julien.
NIPS 2017 (Spotlight)


Asaga: Asynchronous parallel Saga.
Rémi Leblond, Fabian Pedregosa and Simon Lacoste-Julien.
AISTATS 2017
Presented at OPT2016 (9th NIPS Workshop on Optimization for Machine Learning)


BibTeX

@article{pedregosa2017proxasaga,
      title    = "Breaking the Nonsmooth Barrier: A Scalable Parallel Method for Composite Optimization",
      journal  = "arxiv:1707.06468",
      author   = "Pedregosa, Fabian and Leblond, R\'emi and Lacoste-Julien, Simon",
      year     = "2017"
}
@inproceedings{leblond2017Asaga,
      title     = "{ASAGA}: Asynchronous Parallel {SAGA}",
      booktitle = "Proceedings of the 20th International Conference on Artificial Intelligence and Statistics",
      author    = "Leblond, R\'emi and Pedregosa, Fabian and Lacoste-Julien, Simon",
      year      = "2017",
      url       = {http://proceedings.mlr.press/v54/leblond17a.html}
}

Abstracts

Due to their simplicity and excellent performance, parallel asynchronous variants of stochastic gradient descent have become popular methods to solve a wide range of large-scale optimization problems on multi-core architectures. Yet, despite their practical success, support for nonsmooth objectives is still lacking, making them unsuitable for many problems of interest in machine learning, such as the Lasso, group Lasso or empirical risk minimization with convex constraints. In this work, we propose and analyze ProxASAGA, a fully asynchronous sparse method inspired by SAGA, a variance reduced incremental gradient algorithm. The proposed method is easy to implement and significantly outperforms the state of the art on several nonsmooth, large-scale problems. We prove that our method achieves a theoretical linear speedup with respect to the sequential version under assumptions on the sparsity of gradients and block-separability of the proximal term. Empirical benchmarks on a multi-core architecture illustrate practical speedups of up to 12x on a 20-core machine.

We describe Asaga, an asynchronous parallel version of the incremental gradient algorithm Saga that enjoys fast linear convergence rates. We highlight a subtle but important technical issue present in a large fraction of the recent convergence rate proofs for asynchronous parallel optimization algorithms, and propose a simplification of the recently proposed “perturbed iterate” framework that resolves it. We thereby prove that Asaga can obtain a theoretical linear speed-up on multicore systems even without sparsity assumptions. We present results of an implementation on a 40-core architecture illustrating the practical speed-up as well as the hardware overhead.

Acknowledgements

We thank Xinghao Pan for sharing with us his implementation of Kromagnon, a competing algorithm we compare Asaga to in our paper. Our work was partially supported by a Google Research Award.