We propose an approach for analyzing unpaired visual data annotated with time stamps by generating how images would have looked like if they were from different times. To isolate and transfer time dependent appearance variations, we introduce a new trainable bilinear factor separation module. We analyze its relation to classical factored representations and concatenation-based auto-encoders. We insert and train it in a encoder-decoder convolutional neural network architecture and in a recent adversarial image translation architecture. We demonstrate this new module has clear advantages compared to standard concatenation when used in a bottleneck encoder-decoder convolutional neural network architecture. We also show that it can be inserted in a recent adversarial image translation architecture, enabling transfer to multiple different target time periods using a single network. We apply our model to a challenging collection of more than 13,000 cars manufactured between 1920 and 2000 and a dataset of high school yearbook portraits from 1930 to 2009. This allows us, for a given new input image, to generate a "history-lapse video" revealing changes over time by simply varying the latent variable corresponding to time. We show that by analyzing the generated history-lapse videos we can identify object deformations across time, extracting interesting changes in visual style over decades.
T. Dalens, M. Aubry and J. Sivic Bilinear image translation for temporal analysis of photo collections Supplementary material
AcknowledgementsThis work has been supported by the European Research Council (ERC grant LEAP no. 336845), Agence Nationale de la Recherche (Semapolis project, ANR-13-CORD-0003, EnHerit project, ANR-17-CE23-0008), CIFAR Learning in Machines\&Brains program and European Regional Development Fund under the project IMPACT (reg. no. CZ.02.1.01/0.0/0.0/15_003/0000468). |