I am a permanent CNRS researcher at Sorbonne Université, in the Laboratoire d'Informatique de Paris-6 (LIP6).

Research interests: The focus of my research is on the design of algorithms for clustering and network design problems, with an emphasis on problems arising in data analysis and machine learning contexts. My goal is to come up with efficient algorithms and understand the complexity of these problems. I have also a strong interest in online optimization, learning theory, computational geometry and graph theory.

My (almost up-to-date) Curriculum Vitae.

News: My ANR project on the foundations of clustering algorithms has been funded!
I am visiting Brown DSI until March 31.

Previously: I was working at the University of Copenhagen, supported by a Marie Sklodowska-Curie individual fellowship. I was very lucky to be hosted by Prof. Mikkel Thorup.
I did my Ph.D at the Département d'Informatique de l'École normale supérieure. I was very fortunate to have Prof. Claire Mathieu as my advisor. Here is the manuscript of my thesis.


  • Oblivious Dimension Reduction for k-Means -- Beyond Subspaces and the Johnson-Lindenstrauss Lemma

    Joint work with Luca Becchetti, Marc Bury, Fabrizio Grandoni, and Chris Schwiegelshohn

    To appear in the proceedings of the Symposium on Theory of Computing (STOC) 2019.

    Click to show the abstract.

    We show that for n points in d-dimensional Euclidean space, a data oblivious random projection of the columns onto m ∈ O((log k + log log n) ε⁻⁶ log 1/ε) dimensions is sufficient to approximate the cost of any k-means clustering up to a multiplicative (1 ± ε) factor. The previous-best upper bounds on m are O((log n)ε⁻²) given by a direct application of the Johnson-Lindenstrauss Lemma, and O(k ε⁻²) given by [Cohen et al.-STOC’15]. We also prove the existence of a non-oblivious cost preserving sketch with target dimension O(log k ε⁻⁴ log 1/ε), improving on k/ε [Cohen et al.-STOC’15]. Furthermore, we show how to construct strong coresets for the k-means problem of size O(k · poly(log k, ε⁻¹)). Previous constructions of strong coresets have size of order k · min(d, k/ε).

  • Lower Bounds for Text Indexing with Mismatches and Differences

    Joint work with Laurent Feuilloley and Tatiana Starikovskaya

    Proceedings of the Symposium on Discrete Algorithms (SODA) 2019.

    Click to show the abstract.

    In this paper we study lower bounds for the fundamental problem of text indexing with mismatches and differences. In this problem we are given a long string of length n, the “text”, and the task is to preprocess it into a data structure such that given a query string Q, one can quickly identify substrings that are within Hamming or edit distance at most k from Q. This problem is at the core of various problems arising in biology and text processing. While exact text indexing allows linear-size data structures with linear query time, text indexing with k mismatches (or k differences) seems to be much harder: All known data structures have exponential dependency on k either in the space, or in the time bound. We provide conditional and pointer-machine lower bounds that make a step toward explaining this phenomenon. We start by demonstrating conditional lower bounds for k = Θ(log n). We show that as- suming the Strong Exponential Time Hypothesis, any data structure for text indexing that can be constructed in polynomial time cannot have O(n^1−δ) query time, for any δ > 0. This bound also extends to the setting where we only ask for (1 + ε)-approximate solutions for text index- ing. However, in many applications the value of k is rather small, and one might hope that for small k we can develop more efficient solutions. We show that this would require a radically new approach as using the current methods one cannot avoid exponential dependency on k either in the space, or in the time bound for all even √log n ≤ k = o(log n). Our lower bounds also apply to the dictionary look-up problem, where instead of a text one is given a set of strings.

  • Clustering Redemption – Beyond the Impossibility of Kleinberg’s Axioms

    Joint work with Varun Kanade and Frederik Mallmann-Trenn

    Proceedings of the conference on Neural Information Processing Systems (NIPS) 2018.

    Click to show the abstract.

    Kleinberg stated three axioms that any clustering procedure should satisfy and showed there is no clustering procedure that simultaneously satisfies all three. One of these, called the consistency axiom, requires that when the data is modified in a helpful way, i.e. if points in the same cluster are made more similar and those in different ones made less similar, the algorithm should output the same clustering. To circumvent this impossibility result, research has focused on considering clustering procedures that have a clustering quality measure (or a cost) and showing that a modification of Kleinberg’s axioms that takes cost into account lead to feasible clustering procedures. In this work, we take a different approach, based on the observation that the consistency axiom fails to be satisfied when the “correct” number of clusters changes. We modify this axiom by making use of cost functions to determine the correct number of clusters, and require that consistency holds only if the number of clusters remains unchanged. We show that single linkage satisfies the modified axioms, and if the input is well-clusterable, some popular procedures such as k-means also satisfy the axioms, taking a step towards explaining the success of these objective functions for guiding the design of algorithms.

  • Balanced Centroidal Power Diagrams for Redistricting

    Joint work with Philip Klein and Neal Young

    Proceedings of the International Conference on Advances in Geographic Information Systems (SIGSPATIAL) 2018.

    Click to show the abstract.

    We consider the problem of political redistricting: given the locations of people in a geographical area (e.g. a US state), the goal is to decompose the area into subareas, called districts, so that the populations of the districts are as close as possible and the districts are “compact” and “contiguous,” to use the terms referred to in most US state constitutions and/or US Supreme Court rulings. We study a method that outputs a solution in which each district is the intersection of a convex polygon with the geographical area. The average number of sides per polygon is less than six. The polygons tend to be quite compact. Every two districts differ in population by at most one (so we call the solution balanced ). The solution is a centroidal power diagram: each polygon has an associated center in R 3 such that
    • the projection of the center onto the plane z = 0 is the centroid of the locations of people assigned to the polygon, and
    • for each person assigned to that polygon, the polygon’s center is closest among all centers. The polygons are convex because they are the intersections of 3D Voronoi cells with the plane.
    The solution is in a sense a locally optimal solution to the problem of choosing centers in the plane and choosing an assignment of people to those 2-d centers so as to minimize the sum of squared distances subject to the assignment being balanced. A practical problem with this method is that, in real-world redistricting, exact locations of people are unknown. Instead, the input consists of polygons (census blocks) and associated populations. A real redistricting must not split census blocks. We therefore propose a post-processing step in which we perturb the solution so it does not split census blocks. In our experiments, our postprocessing step is able to achieve this while preserving population balance and connectivity of the districts. The district polygons are no longer convex at the fine scale because their boundaries must follow the boundaries of census blocks, but at a coarse scale they preserve the shape of the original polygons.

  • Fast Fencing

    Joint work with Mikkel Abrahamsen, Anna Adamaszek, Karl Bringmann, Mehran Mehr, Eva Rotenberg, Alan Roytman, and Mikkel Thorup

    Proceedings of the Symposium on Theory of Computing (STOC) 2018.

    Click to show the abstract.

    We consider very natural “fence enclosure” problems studied by Arkin, Khuller, and Mitchell in the early 90s. Given a set S of n points in the plane, we aim at finding a set of closed curves such that (1) each point is enclosed by a curve and (2) the total length of the curves is minimized. We consider two main variants. In the first variant, we pay a unit cost per curve in addition to the total length of the curves. An equivalent formulation of this version is that we have to enclose n unit disks, paying only the total length of the enclosing curves. In the other variant, we are allowed to use at most k closed curves and pay no cost per curve. For the variant with at most k closed curves, we present an algorithm that is polynomial in both n and k. For the variant with unit cost per curve, or unit disks, we present a near-linear time algorithm. Arkin, Khuller, and Mitchell solved the problem with at most k curves in n^O(k) time, and used this to solve the unit cost per curve version in exponential time. At the time, they conjectured that the problem with k curves is NP-hard for general k. Our polynomial time algorithm refutes this unless P equals NP.

  • A Fast Approximation Scheme for Low-Dimensional k-Means

    ArXiv - Proceedings of the Symposium on Discrete Algorithms (SODA) 2018.

    Click to show the abstract.

    We consider the popular k-means problem in d-dimensional Euclidean space. Recently Friggstad, Rezapour, Salavatipour [FOCS'16] and Cohen-Addad, Klein, Mathieu [FOCS'16] showed that the standard local search algorithm yields a (1+epsilon)-approximation in time (n * k)^(1/epsilon^O(d)), giving the first polynomial-time approximation scheme for the problem in low-dimensional Euclidean space. While local search achieves optimal approximation guarantees, it is not competitive with the state-of-the-art heuristics such as the famous k-means++ and D^2-sampling algorithms. In this paper, we aim at bridging the gap between theory and practice by giving a $(1+epsilon)$-approximation algorithm for low-dimensional k-means running in time n * k * (log n)^(d/epsilon)^O(d), and so matching the running time of the k-means++ and D^2-sampling heuristics up to polylogarithmic factors. We speed-up the local search approach by making a non-standard use of randomized dissections that allows to find the best local move efficiently using a quite simple dynamic program. We hope that our techniques could help design better local search heuristics for geometric problems.

  • Hierarchical Clustering: Objective Functions and Algorithms

    Joint work with Varun Kanade, Frederik Mallmann-Trenn, and Claire Mathieu

    ArXiv - Proceedings of the Symposium on Discrete Algorithms (SODA) 2018.

    Click to show the abstract.

    Hierarchical clustering is a recursive partitioning of a dataset into clusters at an increasingly finer granularity. Motivated by the fact that most work on hierarchical clustering was based on providing algorithms, rather than optimizing a specific objective, Dasgupta framed similarity-based hierarchical clustering as a combinatorial optimization problem, where a `good' hierarchical clustering is one that minimizes some cost function. He showed that this cost function has certain desirable properties. We take an axiomatic approach to defining `good' objective functions for both similarity and dissimilarity-based hierarchical clustering. We characterize a set of "admissible" objective functions (that includes Dasgupta's one) that have the property that when the input admits a `natural' hierarchical clustering, it has an optimal value. Equipped with a suitable objective function, we analyze the performance of practical algorithms, as well as develop better algorithms. For similarity-based hierarchical clustering, Dasgupta showed that the divisive sparsest-cut approach achieves an O(log3/2n)-approximation. We give a refined analysis of the algorithm and show that it in fact achieves an O(√log n)-approx. (Charikar and Chatziafratis independently proved that it is a O(√log n)-approx.). This improves upon the LP-based O(log n)-approx. of Roy and Pokutta. For dissimilarity-based hierarchical clustering, we show that the classic average-linkage algorithm gives a factor 2 approx., and provide a simple and better algorithm that gives a factor 3/2 approx.. Finally, we consider `beyond-worst-case' scenario through a generalisation of the stochastic block model for hierarchical clustering. We show that Dasgupta's cost function has desirable properties for these inputs and we provide a simple 1 + o(1)-approximation in this setting.

  • A Near-Linear Approximation Scheme for Multicuts of Embedded Graphs with a Fixed Number of Terminals

    Joint work with Éric Colin de Verdière and Arnaud de Mesmay

    ArXiv - Proceedings of the Symposium on Discrete Algorithms (SODA) 2018.

    Click to show the abstract.

    For an undirected edge-weighted graph G and a set R of pairs of vertices called pairs of terminals, a multicut is a set of edges such that removing these edges from G disconnects each pair in R. We provide an algorithm computing a (1+ε)-approximation of the minimum multicut of a graph G in time (g+t)(O(g+t)3)⋅(1/ε)O(g+t)⋅n log n, where g is the genus of G and t is the number of terminals. This result is tight in several aspects, as the minimum multicut problem is both APX-hard and W[1]-hard (parameterized by the number of terminals), even on planar graphs (equivalently, when g=0). In order to achieve this, our article leverages on a novel characterization of a minimum multicut as a family of Steiner trees in the universal cover of a surface on which G is embedded. The algorithm heavily relies on topological techniques, and in particular on the use of homotopical tools and computations in covering spaces, which we blend with classic ideas stemming from approximation schemes for planar graphs and low-dimensional geometric inputs.

  • The Bane of Low-Dimensionality Clustering

    Joint work with Arnaud de Mesmay, Eva Rotenberg, and Alan Roytman

    ArXiv - Proceedings of the Symposium on Discrete Algorithms (SODA) 2018.

    Click to show the abstract.

    In this paper, we give a conditional lower bound of n^Ω(k) for the classic k-median and k-means clustering objectives (where n is the size of the input), even in low-dimensional Euclidean space of dimension four, assuming the Exponential Time Hypothesis (ETH). We also consider k-median (and k-means) with penalties where each point need not be assigned to a center, in which case it must pay a penalty, and extend our lower bound to at least three-dimensional Euclidean space. This stands in stark contrast to many other geometric problems such as the traveling salesman problem, or computing an independent set of unit spheres. While these problems benefit from the so-called (limited) blessing of dimensionality, as they can be solved in time n^O(k^(1-1/d)) or 2^(n^(1−1/d)) in d dimensions, our work shows that widely-used clustering objectives have a lower bound of n^Ω(k), even in dimension four. We complete the picture by considering the two-dimensional case: we show that there is no algorithm that solves the penalized version in time less than n^o(√k), and provide a matching upper bound of n^O(√k). The main tool we use to establish these lower bounds is the placement of points on the moment curve, which takes its inspiration from constructions of point sets yielding Delaunay complexes of high complexity.

  • Hierarchical Clustering Beyond the Worst-Case

    Joint work with Varun Kanade and Frederik Mallmann-Trenn

    NIPS proceedings - Proceedings of the Conference on Neural Information Processing Systems (NIPS) 2017.

    Click to show the abstract.

    Hierarchical clustering, that is computing a recursive partitioning of a dataset to obtain clusters at increasingly finer granularity is a fundamental problem in data analysis. Although hierarchical clustering has mostly been studied through procedures such as linkage algorithms, or top-down heuristics, rather than as optimization problems, recently Dasgupta proposed an objective function for hierarchical clustering and initiated a line of work developing algorithms that explicitly optimize an objective. In this paper, we consider a fairly general random graph model for hierarchical clustering, called the hierarchical stochastic block model (HSBM), and show that in certain regimes the SVD approach of McSherry combined with specific linkage methods results in a clustering that give an O(1) approximation to Dasgupta’s cost function. We also show that an approach based on SDP relaxations for balanced cuts based on the work of Makarychev et al., combined with the recursive sparsest cut algorithm of Dasgupta, yields an O(1) approximation in slightly larger regimes and also in the semi-random setting, where an adversary may remove edges from the random graph generated according to an HSBM. Finally, we report empirical evaluation on synthetic and real-world data showing that our proposed SVD-based method does indeed achieve a better cost than other widely-used heurstics and also results in a better classification accuracy when the underlying problem was that of multi-class classification.

  • On the Local Structure of Stable Clustering Instances

    Joint work with Chris Schwiegelshohn

    ArXiv - Proceedings of the Symposium on Foundations of Computer Science (FOCS) 2017.

    Click to show the abstract.

    We study the classic k-median and k-means clustering objectives in the beyond-worst-case scenario. We consider three well-studied notions of structured data that aim at characterizing real-world inputs: Distribution Stability (introduced by Awasthi, Blum, and Sheffet, FOCS 2010), Spectral Separability (introduced by Kumar and Kannan, FOCS 2010), Perturbation Resilience (introduced by Bilu and Linial, ICS 2010). We prove structural results showing that inputs satisfying at least one of the conditions are inherently "local". Namely, for any such input, any local optimum is close both in term of structure and in term of objective value to the global optima. As a corollary we obtain that the widely-used Local Search algorithm has strong performance guarantees for both the tasks of recovering the underlying optimal clustering and obtaining a clustering of small cost. This is a significant step toward understanding the success of local search heuristics in clustering applications.

  • Fast and Compact Exact Distance Oracle for Planar Graphs

    Joint work with Søren Dahlgaard, and Christian Wulff-Nilsen

    ArXiv - Proceedings of the Symposium on Foundations of Computer Science (FOCS) 2017.

    Click to show the abstract.

    For a given a graph, a distance oracle is a data structure that answers distance queries between pairs of vertices. We introduce an O(n^5/3)-space distance oracle which answers exact distance queries in O(log n) time for n-vertex planar edge-weighted digraphs. All previous distance oracles for planar graphs with truly subquadratic space i.e., space O(n^(2-epsilon)) for some constant epsilon>0) either required query time polynomial in n or could only answer approximate distance queries. Furthermore, we show how to trade-off time and space: for any S>=n^3/2, we show how to obtain an S-space distance oracle that answers queries in time O((n^(5/2)/S^(3/2))log n). This is a polynomial improvement over the previous planar distance oracles with o(n^(1/4)) query time.

  • Online Optimization of Smoothed Piecewise Constant Functions

    Joint work with Varun Kanade

    ArXiv - Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS) 2017.

    Click to show the abstract.

    We study online optimization of smoothed piecewise constant functions over the domain [0, 1). This is motivated by the problem of adaptively picking parameters of learning algorithms as in the recently introduced framework by Gupta and Roughgarden. Majority of the machine learning literature has focused on Lipschitz-continuous functions or functions with bounded gradients. This is with good reason -- any learning algorithm suffers linear regret even against piecewise constant functions that are chosen adversarially, arguably the simplest of non-Lipschitz continuous functions. The smoothed setting we consider is inspired by the seminal work of Spielman and Teng and the recent work of Gupta and Roughgarden in this setting, the sequence of functions may be chosen by an adversary, however, with some uncertainty in the location of discontinuities. We give algorithms that achieve sublinear regret in the full information and bandit settings.

  • Steinberg's Conjecture is false

    Joint work with Michael Hebdige, Daniel Kral, Zhentao Li, and Esteban Salgado

    ArXiv - Journal of Combinatorial Theory, Series B (JCTB), 2017.
    See also this article (in french) in the popular science magazine "Pour la Science".

    Click to show the abstract.

    Steinberg conjectured in 1976 that every planar graph with no cycles of length four or five is 3-colorable. We disprove this conjecture.

  • Local search yields approximation schemes for k-means and k-median in Euclidean and minor-free metrics

    Joint work with Philip N. Klein, and Claire Mathieu.

    ArXiv - Proceedings of the Symposium on Foundations of Computer Science (FOCS) 2016.

    Click to show the abstract.

    We give the first polynomial-time approximation schemes (PTASs) for the following problems: (1) uniform facility location in edge-weighted planar graphs; (2) k-median and k-means in edge-weighted planar graphs; (3) k-means in Euclidean space of bounded dimension. Our first and second results extend to minor-closed families of graphs. All our results extend to cost functions that are the p-th power of the shortest-path distance. The algorithm is local search where the local neighborhood of a solution S consists of all solutions obtained from S by removing and adding 1/epsilon^O(1) centers.

  • The Invisible Hand of Dynamic Market Pricing

    Joint work with Alon Eden, Michal Feldman, and Amos Fiat.

    ArXiv - Proceedings of the Conference on Economics and Computation (EC) 2016.

    Click to show the abstract.

    Walrasian prices, if they exist, have the property that one can assign every buyer some bundle in her demand set, such that the resulting assignment will maximize social welfare. Unfortu- nately, this assumes carefully breaking ties amongst different bundles in the buyer demand set. Presumably, the shopkeeper cleverly convinces the buyer to break ties in a manner consistent with maximizing social welfare. Lacking such a shopkeeper, if buyers arrive sequentially and simply choose some arbitrary bundle in their demand set, the social welfare may be arbitrar- ily bad. In the context of matching markets, we show how to compute dynamic prices, based upon the current inventory, that guarantee that social welfare is maximized. Such prices are set without knowing the identity of the next buyer to arrive. We also show that this is impossible in general (e.g., for coverage valuations), but consider other scenarios where this can be done.

  • Diameter and k-Center Clustering in Sliding Window

    Joint work with Chris Schwiegelshohn, and Christian Sohler.

    Proceedings of the International Colloquium on Automata, Languages, and Programming (ICALP) 2016.

    Click to show the abstract.

    In this paper, we investigate small space summaries for diameter and k-center clus- tering in sliding windows. In particular, we give an algorithm maintaining a 3 + epsilon approximate diameter, i.e. the maximum distance between two input points, in arbitrary metric spaces with nearly optimal space requirements. Furthermore, we investigate the related problem of k-center clustering, i.e. the covering of the input with k balls of minimum radius. For k = 2 we give a sliding window algorithm with optimal approximation ratio 4 + epsilon and for arbitrary values of k, algorithms with approximation ratio 6 + epsilon in general metric spaces and 4.48 in Euclidean spaces. In the metric distance model it is assumed that any algorithm with space x can store at most x points and that it cannot generate new points that are distinct from the stored points except by reading from the input stream. Under this assumptions, we obtain lower bounds that separate the space complexity for sliding window algorithm from insertion-only algorithms by an exponential factor. To our knowledge, no such separation was previously known.

  • Approximating Connectivity Domination in Weighted Bounded-Genus Graphs

    Joint work with Éric Colin de Verdière, Philip N. Klein, Claire Mathieu, and David Meierfrankenfeld.

    Proceedings of the Symposium on Theory of Computing (STOC) 2016.

    Click to show the abstract.

    We present a framework for addressing several problems on weighted planar graphs and graphs of bounded genus. With that framework, we derive polynomial-time approximation schemes for the following problems in planar graphs or graphs of bounded genus: edge-weighted tree cover and tour cover; vertex-weighted connected dominating set, maximum-weight-leaf spanning tree, and connected vertex cover. In addition, we obtain a polynomial-time approximation scheme for feedback vertex set in planar graphs. These are the first polynomial-time approximation schemes for all those problems in weighted embedded graphs. (For unweighted versions of some of these problems, poynomial-time approximation schemes were previously given using bidimensionality.) Additionally, we design a quasi- polynomial-time approximation scheme for weighted connected face cover in planar graphs, which implies a quasi-polynomial-time approximation scheme for the minimum corridor problem, a problem in the plane.

  • Algorithmic Aspects of Switch Cographs

    Joint work with Michel Habib and Fabien de Montgolfier.

    ArXiv - Discrete Applied Mathematics, vol. 200. 2016.

    Click to show the abstract.

    We introduced the notion of involution module, the first generalization of modular decomposition which has a unique linear-sized decomposition tree in the general context of 2-structures. We derived O(n²) decomposition algorithm and we take advantage of the involution modular decomposition tree to state several algorithmic results. We introduced the class of switch cographs, the class of graphs that are totally decomposable w.r.t involution modular decomposition. This class generalizes the class of cographs and is exactly the class of (Bull, Gem, Co-Gem, C5)-free graphs. We used our new decomposition tool to design three practical algorithms for the maximum cut and the vertex separator problems. The complexity of these problems was still unknown for this class of graphs. We also improved the complexity of the maximum clique, the maximum independant set, the chromatic number and the maximum clique cover problems by giving efficient algorithms, thanks to the decomposition tree. Eventually, we showed that this class of graphs has Clique-Width at most 4 and that a clique-width expression can be computed in linear time.

  • A Fixed Parameter Tractable Approximation Scheme for the Optimal Cut Graph of a Surface

    Joint work with Arnaud de Mesmay.

    ArXiv - Proceedings of the European Symposium on Algorithms (ESA) 2015.

    Click to show the abstract.

    Given a graph G cellularly embedded on a surface S of genus g, a cut graph is a subgraph of G such that cutting S along G yields a topological disk. We provide a fixed parameter tractable approximation scheme for the problem of computing the shortest cut graph, that is, for any epsilon > 0, we show how to compute a (1 + epsilon) approximation of the shortest cut graph in time f (epsilon, g)n^3 . Our techniques first rely on the computation of a spanner for the problem using the technique of brick decompositions, to reduce the problem to the case of bounded tree-width. Then, to solve the bounded tree-width case, we introduce a variant of the surface-cut decomposition of Rué, Sau and Thilikos, which may be of independent interest

  • Effectiveness of Local Search for Geometric Optimization.

    Joint work with Claire Mathieu.

    ArXiv - Proceedings of the Symposium on Computational Geometry (SoCG) 2015.

    Click to show the abstract.

    We prove that local search with local neighborhoods of magnitude of 1/epsilon^C is an approximation scheme for the following problems in the Euclidian plane: TSP with random inputs, Steiner tree with random inputs, facility location (with worst case inputs), and bicriteria k-median (also with worst case inputs). We show that the randomness assumption is necessary for TSP

  • Energy-Efficient Algorithms for Non-Preemptive Speed-Scaling

    Joint work with Zhentao Li, Claire Mathieu, and Ioannis Milis.

    ArXiv - Proceedings of the Workshop on Approximation and Online Algorithms (WAOA) - 2014.

    Click to show the abstract.

    We improve complexity bounds for energy-efficient speed scheduling problems for both the single processor and multi-processor cases. Energy conservation has become a major concern, so revisiting traditional scheduling problems to take into account the energy consumption has been part of the agenda of the scheduling community for the past few years. We consider the energy minimizing speed scaling problem introduced by Yao et al. where we wish to schedule a set of jobs, each with a release date, deadline and work volume, on a set of identical processors. The processors may change speed as a function of time and the energy they consume is the αth power of its speed. The objective is then to find a feasible schedule which minimizes the total energy used. We show that in the setting with an arbitrary number of processors where all work volumes are equal, there is a 2(1+\varepsilon)(5(1+\varepsilon))\alpha−1B_\alpha=O_\alpha(1) approximation algorithm, where B_\alpha is the generalized Bell number. This is the first constant factor algorithm for this problem. This algorithm extends to general unequal processor-dependent work volumes, up to losing a factor of (((1+r)r)^\alpha)/2 in the approximation, where r is the maximum ratio between two work volumes. We then show this latter problem is APX-hard, even in the special case when all release dates and deadlines are equal and r is 4. In the single processor case, we introduce a new linear programming formulation of speed scaling and prove that its integrality gap is at most 12^(\alpha−1). As a corollary, we obtain a (12(1+\varepsilon))^(\alpha−1) approximation algorithm where there is a single processor, improving on the previous best bound of 2^{\alpha−1)(1+\varepsilon)^\alpha B_\alpha when \alpha > 24.