Fast Statistical Analysis of Web Data
via Sparse Learning
Associated team (EA)
Created June 1st, 2011
Last modified: October 31, 2013
|Research center(s) : INRIA Paris -
||Institution : University of California
Berkeley, United States
|INRIA Research team : SIERRA
|Laboratory : EECS, Statistics and IEOR
|Principal investigator : Francis Bach
|Principal investigator : Laurent El Ghaoui
Laurent El Ghaoui
of the Associated Team
The goal of the proposed research is to provide web-based tools for the
analysis and visualization of large corpora of text documents, with a focus
on databases of news articles. We intend to use advanced algorithms, drawing
from recent progresses in machine learning and statistics, to allow a user
to quickly produce a short summary and associated timeline showing how a
certain topic is described in news media. We are also interested in
unsupervised learning techniques that allow a user to understand the
difference between several different news sources, topics or documents.
between EA partners
- Prof. Bin Yu (Berkeley, Statistics deparment): one-month visit (June 2011)
- Dr. Francis Bach (INRIA SIERRA): one-weak visit (October 2011)
- Dr. Ronny Luss (INRIA postdoc): 6-month visit (September 2011 - March
- Mert Pilanci: 1-month visit (July 2012)
- Prof. Michael Jordan: 1-year visit (September 2012-July 2013)
- Dr. Francis Bach (INRIA SIERRA):
one-week visit (December 2013)
- Dr. Simon Lacoste-Julien (INRIA SIERRA): one-week visit (December 2013)
- Edouard Grave (INRIA SIERRA): one-week visit (December 2013)
-A. d’Aspremont, F. Bach, L. El Ghaoui. Approximation Bounds for Sparse
Principal Component Analysis. To appear in Mathematical Programming, 2013.
-S. Jegelka, F. Bach, S. Sra. Reflection methods for user-friendly
submodular function minimization. To appear at NIPS conference, 2013.
-M. Pilanci, F. Bach. Optimal statistical/computational trade-offs for
sparse recovery. In preparation.
-R. Luss, F. Bach. Generalized conditional gradients for large-scale
structured problems. In preparation.
events and seminars
- Prof. Bin Yu: SMILE seminar on "Spectral clustering and the
high-dimensional Stochastic Block Model" - June 23rd, 2011
- Prof. Michael Jordan: ENS Computer Science Department seminar - October
- F. Bach, Workshop INRIA@Siliconvalley, May 21-22, 2012, Paris
program and progress
- The team started on June 1st 2011, and Dr. Ronny Luss was hired in
September 2011 to work on the joint project. This has led two a paper under
- PhD defense of Rodolphe Jenatton - November 2011 (Prof. El Ghaoui is a
- The visit of Mert Pilanci has allowed the start of new collaborations in
the area of sparse learning
- Visit of Prof. Michael Jordan (starting 2012): co-financed by INRIA and
the Fondation de Sciences Mathematiques de Paris.
- Edouard Grave will be hired as a joint postdoc in January 2014
Our work has been organized around four projects:
(1) Theoretical analysis of sparse PCA:
A. d’Aspremont, F. Bach, L. El Ghaoui, paper accepted at Mathematical
One key problem in the analysis and design of sparse methods is to
characterize if the signals which are obtained by sparse methods are
relevant, i.e., are they actually signals or in fact noise? In an idealized
theoretical set-up, where we have n p-dimensional observations that are
independent and identically normally distributed, this corresponds to
distinguishing between a covariance identity proportional to identity and
covariance that is a sum of an identity matrix and a sparse rank-one matrix.
The so-called detection threshold is the magnitude of this rank-one matrix
(the smaller the better).
Without any assumption, the best threshold is √(p/n). In a high-dimensional
set-up where p is much larger than n, this is useless. When making a
“sparse” assumption that only k components of the vector to detect are
non-zero, then it goes down to √(k log(p) /n). However, this can be achieved
only by solving a NP-hard problem. We have shown that methods based on
convex relaxations (and hence implementable in polynomial time) could reach
the desired detection threshold, improving over the best known
polynomial-time result (which was √k larger).
(2) Parallel graph cuts for computer vision: S. Jegelka, F. Bach, S. Sra.
paper accepted at NIPS conference.
Recently, it has become evident that submodularity naturally captures widely
occurring concepts in machine learning, signal processing and computer
vision. In consequence, there is need for efficient optimization procedures
for submodular functions, in particular for minimization problems.
While general submodular minimization is challenging, we propose a new
approach that exploits existing decomposability of submodular functions. In
contrast to previous approaches, our method is neither approximate, nor
impractical, nor does it need any cumbersome parameter tuning. Moreover, it
is easy to implement and parallelize.
A key component of our approach is a formulation of the discrete submodular
minimization problem as a continuous best approximation problem. It is
solved through a sequence of reflections and its solution can be
automatically thresholded to obtain an optimal discrete solution. Our method
solves both the continuous and discrete formulations of the problem, and
therefore has applications in learning, inference, and reconstruction.
In our experiments, we show the benefits of our new algorithms for two image
segmentation tasks, providing the first simple parallel algorithm for such
(3) Optimal statistical/computational
trade-offs: M. Pilanci, F. Bach, paper in preparation.
In a supervised learning problem or in signal processing, the assumption of
sparsity allows to reduce considerably the number of observations required
to achieve a certain prediction performance. However, the natural
formulation is combinatorial and leads to NP-hard optimization problems. A
key success of applied mathematics in the last few years has been to show
that a convex relaxation could be used (based on the L1-norm) and achieves
optimality under certain scenarios.
In this project, we aim to bridge the gap between convex and combinatorial
techniques and provide a sequence of convex problems of increasing
complexity, allowing to adapt the computational complexity to the
statistical difficulty of the problem.
(4) Generalized conditional gradient for
large-scale problems: R. Luss, F. Bach, paper in preparation.
Given a convex optimization problem and its dual, there are many possible
first-order algorithms. In this project, we consider the equivalence between
mirror descent algorithms and algorithms generalizing the conditional
gradient method. This is done through convex duality, and implies notably
that for certain problems, such as for supervised machine learning problems
with non-smooth losses or problems regularized by non-smooth regularizers,
the primal subgradient method and the dual conditional gradient method are
formally equivalent. The dual interpretation leads to a form of line search
for mirror descent, as well as guarantees of convergence for primal-dual
This leads to novel algorithms for most structured sparse problems common in
natural language processing