Fast Statistical Analysis of Web Data
via Sparse Learning

Associated team (EA)
Created June 1st, 2011
Last modified: October 31, 2013

Research center(s) : INRIA Paris - Rocquencourt Institution : University of California Berkeley, United States
INRIA Research team : SIERRA
Laboratory : EECS, Statistics and IEOR Departments
Principal investigator : Francis Bach
Principal investigator : Laurent El Ghaoui
Francis Bach
Rodolphe Jenatton
Ronny Luss
Guillaume Obozinski
Sesh Kumar
Edouard Grave
Laurent El Ghaoui
Michael Jordan
Mert Pilanci
Bin Yu
Stefanie Jegelka

Presentation of the Associated Team

The goal of the proposed research is to provide web-based tools for the analysis and visualization of large corpora of text documents, with a focus on databases of news articles. We intend to use advanced algorithms, drawing from recent progresses in machine learning and statistics, to allow a user to quickly produce a short summary and associated timeline showing how a certain topic is described in news media. We are also interested in unsupervised learning techniques that allow a user to understand the difference between several different news sources, topics or documents.

Exchanges/visits between EA partners

- Prof. Bin Yu (Berkeley, Statistics deparment): one-month visit (June 2011)
- Dr. Francis Bach (INRIA SIERRA): one-weak visit (October 2011)
- Dr. Ronny Luss (INRIA postdoc): 6-month visit (September 2011 - March 2012)
- Mert Pilanci: 1-month visit (July 2012)
- Prof. Michael Jordan: 1-year visit (September 2012-July 2013)
- Dr. Francis Bach (INRIA SIERRA): one-week visit (December 2013)
- Dr. Simon Lacoste-Julien (INRIA SIERRA): one-week visit (December 2013)
- Edouard Grave (INRIA SIERRA): one-week visit (December 2013)

Common publications

-A. d’Aspremont, F. Bach, L. El Ghaoui. Approximation Bounds for Sparse Principal Component Analysis. To appear in Mathematical Programming, 2013.
-S. Jegelka, F. Bach, S. Sra. Reflection methods for user-friendly submodular function minimization. To appear at NIPS conference, 2013.
-M. Pilanci, F. Bach. Optimal statistical/computational trade-offs for sparse recovery. In preparation.
-R. Luss, F. Bach. Generalized conditional gradients for large-scale structured problems. In preparation.

Scientific events and seminars

- Prof. Bin Yu: SMILE seminar on "Spectral clustering and the high-dimensional Stochastic Block Model" - June 23rd, 2011
- Prof. Michael Jordan: ENS Computer Science Department seminar - October 2nd, 2012
- F. Bach, Workshop INRIA@Siliconvalley, May 21-22, 2012, Paris

Work program and progress

- The team started on June 1st 2011, and Dr. Ronny Luss was hired in September 2011 to work on the joint project. This has led two a paper under preparation.
- PhD defense of Rodolphe Jenatton - November 2011 (Prof. El Ghaoui is a jury member)
- The visit of Mert Pilanci has allowed the start of new collaborations in the area of sparse learning
- Visit of Prof. Michael Jordan (starting 2012): co-financed by INRIA and the Fondation de Sciences Mathematiques de Paris.
- Edouard Grave will be hired as a joint postdoc in January 2014

Scientific achievements

Our work has been organized around four projects:

(1) Theoretical analysis of sparse PCA: A. d’Aspremont, F. Bach, L. El Ghaoui, paper accepted at Mathematical Programming.

One key problem in the analysis and design of sparse methods is to characterize if the signals which are obtained by sparse methods are relevant, i.e., are they actually signals or in fact noise? In an idealized theoretical set-up, where we have n p-dimensional observations that are independent and identically normally distributed, this corresponds to distinguishing between a covariance identity proportional to identity and covariance that is a sum of an identity matrix and a sparse rank-one matrix. The so-called detection threshold is the magnitude of this rank-one matrix (the smaller the better).

Without any assumption, the best threshold is √(p/n). In a high-dimensional set-up where p is much larger than n, this is useless. When making a “sparse” assumption that only k components of the vector to detect are non-zero, then it goes down to √(k log(p) /n). However, this can be achieved only by solving a NP-hard problem. We have shown that methods based on convex relaxations (and hence implementable in polynomial time) could reach the desired detection threshold, improving over the best known polynomial-time result (which was √k larger).

(2) Parallel graph cuts for computer vision: S. Jegelka, F. Bach, S. Sra. paper accepted at NIPS conference.

Recently, it has become evident that submodularity naturally captures widely occurring concepts in machine learning, signal processing and computer vision. In consequence, there is need for efficient optimization procedures for submodular functions, in particular for minimization problems.

While general submodular minimization is challenging, we propose a new approach that exploits existing decomposability of submodular functions. In contrast to previous approaches, our method is neither approximate, nor impractical, nor does it need any cumbersome parameter tuning. Moreover, it is easy to implement and parallelize.
A key component of our approach is a formulation of the discrete submodular minimization problem as a continuous best approximation problem. It is solved through a sequence of reflections and its solution can be  automatically thresholded to obtain an optimal discrete solution. Our method solves both the continuous and discrete formulations of the problem, and therefore has applications in learning, inference, and reconstruction.

In our experiments, we show the benefits of our new algorithms for two image segmentation tasks, providing the first simple parallel algorithm for such problems.

(3) Optimal statistical/computational trade-offs: M. Pilanci, F. Bach, paper in preparation.

In a supervised learning problem or in signal processing, the assumption of sparsity allows to reduce considerably the number of observations required to achieve a certain prediction performance. However, the natural formulation is combinatorial and leads to NP-hard optimization problems. A key success of applied mathematics in the last few years has been to show that a convex relaxation could be used (based on the L1-norm) and achieves optimality under certain scenarios.

In this project, we aim to bridge the gap between convex and combinatorial techniques and provide a sequence of convex problems of increasing complexity, allowing to adapt the computational complexity to the statistical difficulty of the problem.

(4) Generalized conditional gradient for large-scale problems: R. Luss, F. Bach, paper in preparation.

Given a convex optimization problem and its dual, there are many possible first-order algorithms. In this project, we consider the equivalence between mirror descent algorithms and algorithms generalizing the conditional gradient method. This is done through convex duality, and implies notably that for certain problems, such as for supervised machine learning problems with non-smooth losses or problems regularized by non-smooth regularizers, the primal subgradient method and the dual conditional gradient method are formally equivalent. The dual interpretation leads to a form of line search for mirror descent, as well as guarantees of convergence for primal-dual certificates.
This leads to novel algorithms for most structured sparse problems common in natural language processing