User profiling from unstructured data (free form reviews) Technicolor Paris (Christophe Diot) People filling reviews on sites such as tripadvisor and yelp reveal a lot of personal information on their taste, mood, personality. Similarly, we believe that it should be possible to characterize a restaurant or hotel from the reviews entered by customers. However, the problem is complex as these free form reviews often content typos, slang, and are entered at different times and in different contexts. The objective of this internship is to apply a range of techniques (statistical, learning in particular) to a large data set of reviews collected on yelp and to analyze how much information is revealed in these reviews and whether or not it is possible to characterize a person or a service from these reviews. One of the difficult task will be to eliminate outliers, i.e. reviews entered in conditions that do not reflect the normal state of the system. The work will consist in (1) put together a methodology to analyze the data set and fine tune its parameters in order to obtain relevant profiles of users and services (mostly restaurants in this case); (2) analyze how outliers and time impact these profiles. If time permit, two follow-up works could make a PhD program: (1) the analysis of the impact of time and outliers on the characterization, and (2) how to preserve the privacy of the reviewers while entering reviews. The expected outcomes of this internship are a new methodology, a prototype system, and at least one publication in a top tiers international conference. The internship will mostly take place at Technicolor in Paris (Porte de Versaille).