Optimal Solutions for Sparse Principal Component Analysis

  • TITLE: Optimal Solutions for Sparse Principal Component Analysis.

  • AUTHORS: Alexandre d'Aspremont, Francis Bach, Laurent El Ghaoui

  • ABSTRACT: Given a sample covariance matrix, we examine the problem of maximizing the variance explained by a linear combination of the input variables while constraining the number of nonzero coefficients in this combination. This is known as sparse principal component analysis and has a wide array of applications in machine learning and engineering. We formulate a new semidefinite relaxation to this problem and derive a greedy algorithm that computes a full set of good solutions for all target numbers of non zero coefficients, with total complexity O(n^3), where n is the number of variables. We then use the same relaxation to derive sufficient conditions for global optimality of a solution, which can be tested in O(n^3) per pattern. We discuss applications in subset selection and sparse recovery and show on artificial examples and biological data that our algorithm does provide globally optimal solutions in many cases.

  • STATUS: Journal of Machine Learning Research, 9, pp. 1269-1294, July 2008.

  • ArXiv PREPRINT: 0707.0705

  • PAPER: Optimal Solutions for Sparse Principal Component Analysis in pdf