Context-aware CNNs for person head detection


Person detection is a key problem for many computer vision tasks. While face detection has reached maturity, detecting people under a full variation of camera view-points, human poses, lighting conditions and occlusions is still a difficult challenge. In this work we focus on detecting human heads in natural scenes. Starting from the recent local R-CNN object detector, we extend it with two types of contextual cues. First, we leverage person-scene relations and propose a Global CNN model trained to predict positions and scales of heads directly from the full image. Second, we explicitly model pairwise relations among objects and train a Pairwise CNN model using a structured-output surrogate loss. The Local, Global and Pairwise models are combined into a joint CNN framework. To train and test our full model, we introduce a large dataset composed of 369,846 human heads annotated in 224,740 movie frames. We evaluate our method and demonstrate improvements of person head detection against several recent baselines in three datasets. We also show improvements of the detection speed provided by our model.


	author = {Vu, Tuan{-}Hung and Osokin, Anton and Laptev, Ivan},
	title = {Context-aware {CNNs} for person head detection},
	booktitle =  {International Conference on Computer Vision (ICCV)},
	year = {2015}}

[5.0 Mb] Paper

[1.8 Mb] Poster



Left video: detection results of the local model. Right video: detection results of the pairwise model.


[5.4 Gb] HollywoodHeads dataset


[0.2 Mb] MATLAB Code



[1.3 Gb] Trained models

[69 Mb] Detection results

[388 Mb] Initialization network Oquab et al., CVPR 2014

[1.7 Gb] For training Pairwise model: Local scores on train and validation sets

[2.2 Gb] Selective search region proposals

Casablanca dataset

[105 Mb] Images, annotations and splits

[27 Mb] Selective search region proposals

[94 Mb] Detection results

Page based on the Smint and Font Awesome