Space-Time Interest Points (STIP)

Computes locations and descriptors for space-time interest points in video. The detector is based on the extension of Harris operator to space-time as described in "On Space-Time Interest Points", I.Laptev, IJCV 2005. The code does not implement scale selection, instead interest points are detected at multiple spatial and temporal scales. The implemented descriptors HOG (Histograms of Oriented Gradients) and HOF (Histograms of Optical Flow) are computed for 3D video patches in the neighbourhood of detected STIPs. This detector and descriptors have been successfully used for action recognition in the paper "Learning Realistic Human Actions from Movies", Ivan Laptev, Marcin Marszalek, Cordelia Schmid and Benjamin Rozenfeld; in Proc. CVPR'08.

Sparse Modeling Software (SPAMS)

SPAMS (SPArse Modeling Software) is an optimization toolbox composed of a set of binaries implementing algorithms to address various machine learning and signal processing problems involving a large number of small/medium size sparse decompositions.


Our multi-view stereopsis PMVS software was developed in collaboration with Y. Furukawa at the University of Illinois at Urbana-Champaign (Furukawa and Ponce, 2007) and is publicly available for academics. Licensing negociations with several companies are under way.

Resampling Penalization for histogram selection in regression software

Resampling Penalization is a family of model selection procedure by penalization that can use any exchangeable weighted bootstrap resampling scheme to compute a penalty. It is properly defined in the general framework and extensively studied for histogram selection in regression (see journal paper). This software is a Matlab package for performing Resampling Penalization for several examples of weights in the histogram selection case. The Resampling Penalization package is provided free for non-commercial use under the terms of the GNU General Public License.

Non-uniform deblurring for shaken images

A Matlab package to remove non-uniform blur due to camera shake from a single image, as described in (Whyte et al., 2010).

Automatic Alignment of Paintings to a 3D Model

This code aligns historical paintings of Pompeii to a 3D model constructed from photographs, as described in: B. C. Russell, J. Sivic, J. Ponce, and H. Dessales. Automatic Alignment of Paintings and Photographs Depicting a 3D Scene. 3rd International IEEE Workshop on 3D Representation for Recognition (3dRR-11), associated with ICCV 2011.


Time-lapse videos for long-term observation of people

This time-lapse videos from YouTube provide a rich source of common human-object interactions, including more than 400,000 frames obtained from 146 time-lapse videos of challenging and realistic indoor scenes. This dataset was used in the paper "Scene semantics from long-term observation of people", Vincent Delaitre, David F. Fouhey, Ivan Laptev, Josef Sivic, Abhinav Gupta, Alexei Efros; In Proc. ECCV 2012.

Time-lapse sequences of indoor scenes

Include a collection of monocular time lapse sequences collected from YouTube and a dataset of still images of indoor scenes. This dataset was used in single-view 3D scene understanding, as described in the paper "People Watching: Human Actions as a Cue for Single-View Geometry", David F. Fouhey, Vincent Delaitre, Abhinav Gupta, Alexei Efros, Ivan Laptev, Josef Sivic; In Proc. ECCV 2012.

Annotated video clips for spatio-temporal video segmentation

This dataset was used in the paper "Track to the Future: Spatio-temporal Video Segmentation with Long-range Motion Cues", Jose Lezama, Karteek Alahari, Josef Sivic, Ivan Laptev; In Proc. CVPR 2011.

Annotated movie data of face tracks

This annotated data set contains ground truth labels of face tracks of six different movies. The tracks are labeled with gender (female/male) and age (youth/not youth). This dataset was used in the paper "Semi-supervised learning of facial attributes in video", Neva Cherniavsky, Ivan Laptev, Josef Sivic and Andrew Zisserman; In Parts and Attributes Workshop, ECCV 2010.

Willow Actions

A dataset for human action classification in still images. Action classes are Interacting with computer, Photographing, Playing Instrument, Riding Bike, Riding Horse, Running, Walking This dataset was used in the paper "Recognizing human actions in still images: a study of bag-of-features and part-based representations", V. Delaitre, I. Laptev and J. Sivic; In Proc. BMVC 2010.

Other datasets for Computer Vision

Include 15 scene categories, 3D object recognition stereo dataset, 3D photography dataset, visual hull datasets, birds, butterflies, object recognition dataset, texture dataset, and video sequences.