This is a first semester course at the IASD master of Université PSL.

Reinforcement Learning (RL) refers to scenarios where the learning algorithm operates in closed-loop, simultaneously using past data to adjust its decisions and taking actions that will influence future observations. Algorithms based on RL concepts are now commonly used in programmatic marketing on the web, robotics or in computer game playing. All models for RL share a common concern that in order to attain one's long-term optimality goals, it is necessary to reach a proper balance between exploration (discovery of yet uncertain behaviors) and exploitation (focusing on the actions that have produced the most relevant results so far).

The methods used in RL draw ideas from control, statistics and machine learning. This introductory course will provide the main methodological building blocks of RL, focussing on probabilistic methods in the case where both the set of possible actions and the state space of the system are finite. Some basic notions in probability theory are required to follow the course.

  • Models: Markov decision processes (MDP), multiarmed bandits and other models
  • Planning: finite and infinite horizon problems, the value function, Bellman equations, dynamic programming, value and policy iteration
  • Basic learning tools: Monte Carlo methods, importance sampling, stochastic approximation, temporal-difference learning, policy gradient
  • Probabilistic and statistical tools for RL: Bayesian approach, relative entropy and hypothesis testing, concentration inequalities
  • Optimal exploration in multiarmed bandits: the explore vs exploit tradeoff, lower bounds, the UCB algorithm, Thompson sampling
  • Extensions: Contextual bandits, optimal exploration for MDP

This is largely a blackboard course with an homework (Python) and a final written exam.

Recommended readings

Last edited 20 nov. 2023

This is an optional second semester course at the IASD master of Université PSL taught with Muni Sreenivas Pydi (LAMSADE, Univ. Paris Dauphine-PSL).

This course covers the basics of Differential Privacy (DP), a framework that has become, in the last ten years, a de facto standard for enforcing user privacy in data processing pipelines. DP methods seek to reach a proper trade-off between protecting the characteristics of individuals and guaranteeing that the outcomes of the data analysis stays meaningful.

The first part of the course is devoted the basic notion of epsilon-DP and understanding the trade-off between privacy and accuracy, both from the empirical and statistical points of view. The second half of the course will cover more advanced aspects, including the different variants of DP and the their use to allow for privacy-preserving training of large and/or distributed machine learning models.

  • Motivations, traditional approaches, randomized response
  • Definition and properties of differential privacy
  • Mechanisms for discrete/categorical data
  • Mechanisms for continuous data
  • Alternative notions of differential privacy
  • Differential privacy for statistical learning
  • Attacks and connections with robustness
  • Local differential privacy and federated learning

This course does not have any prerequisite, except from basic knowledge of probabilities, statistics and Python programming.

Validation is through an individual homework (using Python) and the defense of a group project done on a research paper.

Recommended Readings

Last edited 20 nov. 2023

Older Courses

Teaching-related informations prior to 2017 may be found in the French section of this site.