Umut Şimşekli, PhD

Research Grants

DYNASTY: ERC Starting Grant (2022-2027)


I am the PI of the ERC Starting Grant "DYNASTY: Dynamics-Aware Theory of Deep Learning" — 1.5M Euros, 2022-2027. The objective of the project is as follows:
The recent advances in deep learning (DL) have transformed many scientific domains and have had major impacts on industry and society. Despite their success, DL methods do not obey most of the wisdoms of statistical learning theory, and the vast majority of the current DL techniques mainly stand as poorly understood black-box algorithms.
Even though DL theory has been a very active research field in the past few years, there is a significant gap between the current theory and practice: (i) the current theory often becomes vacuous for models with large number of parameters (which is typical in DL), and (ii) it cannot capture the interaction between data, architecture, training algorithm and its hyper-parameters, which can have drastic effects on the overall performance. Due to this lack of theoretical understanding, designing new DL systems has been dominantly performed by ad-hoc, 'trial-and-error' approaches.
The main objective of this proposal is to develop a mathematically sound and practically relevant theory for DL, which will ultimately serve as the basis of a software library that provides practical tools for DL practitioners. In particular, (i) we will develop error bounds that closely reflect the true empirical performance, by explicitly incorporating the dynamics aspect of training, (ii) we will develop new model selection, training, and compression algorithms with reduced time/memory/storage complexity, by exploiting the developed theory.
To achieve the expected breakthroughs, we will develop a novel theoretical framework, which will enable tight analysis of learning algorithms in the lens of dynamical systems theory. The outcomes will help relieve DL from being a black-box system and avoid the heuristic design process. We will produce comprehensive open-source software tools adapted to all popular DL libraries, and test the developed algorithms on a wide range of real applications arising in computer vision, audio/music/natural language processing.


FBIMATRIX: ANR / TUBITAK International Grant (2016-2021)


I was the co-PI of the ANR/TUBITAK International Grant "FBIMATRIX: Parallel and Distributed Markov Chain Monte Carlo for Bayesian Inference in Matrix and Tensor Factorization Models" — 2016-2022. The objective of the project is as follows:

Matrix and tensor factorization methods provide a unifying view for a broad spectrum of techniques in machine learning and signal processing, providing both sensible statistical models for datasets as well as efficient computational procedures framed as decomposition algorithms. So far, algebraic or optimization based approaches prevailed for computation of such factorizations. In contrast, the FBIMATRIX project aims to develop the state-of-the-art Markov Chain Monte Carlo (MCMC) methods for Full Bayesian Inference in MATRIX and tensor factorization models. The randomization of Monte Carlo is useful in both Bayesian and non-Bayesian analysis such as model selection, model averaging, privacy preservation or simply better accuracy in computing approximate solutions. MCMC methods are generally perceived as being computationally very demanding and impractical, yet by exploiting parallel and distributed computation, we wish to push the state-of-the-art in terms of scalability, statistical efficiency, computational and communication complexity. In fact, we perceive MCMC as a natural general purpose computational tool of the future for inference and model selection in distributed data, eventually complementing optimization for certain big data problems due to its inherently randomized nature. The project will address Bayesian model selection and model averaging for factorization models, using parallel and distributed computation and current advances in Hybrid Monte Carlo methods that simulate an augmented stochastic dynamics. As such, we aim at developing faster algorithms for hard computational problems such as marginal likelihood estimation and improving convergence rates. We will illustrate the practical utility of the developed parallel and distributed MCMC methods on two challenging applications from two domains: audio source separation and missing link prediction.