Research
vidchapters

VidChapters-7M: Video Chapters at Scale

NeurIPS 2023 D&B

icra23elliot

Learning Video-Conditioned Policies for Unseen Manipulation Tasks

ICRA 2023

vid2seq

Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense Video Captioning

CVPR 2023

collision

Assembly Planning from Observations under Physical Constraints

IROS 2022

frozenbilm

Zero-Shot Video Question Answering via Frozen Bidirectional Language Models

NeurIPS 2022

collision

Collision Detection Accelerated: An Optimization Perspective

RSS 2022

tubedetr

TubeDETR: Spatio-Temporal Video Grounding with Transformers

CVPR 2022

vln_duet

Think Global, Act Local: Dual-scale Graph Transformer for Vision-and-Language Navigation

CVPR 2022

lookforthechange

Look for the Change: Learning Object States and State-Modifying Actions from Untrimmed Web Videos

CVPR 2022

lod

Large-Scale Unsupervised Object Discovery

NeurIPS 2021

vln_hamt

History Aware Multimodal Transformer for Vision-and-Language Navigation

NeurIPS 2021

ccvs

CCVS: Context-aware Controllable Video Synthesis

NeurIPS 2021

justask

Just Ask: Learning to Answer Questions from Millions of Narrated Videos

ICCV 2021

ris

Goal-Conditioned Reinforcement Learning with Imagined Subgoals

ICML 2021

rlbc

Learning to combine primitive skills: A step towards versatile robotic manipulation

ICRA 2020

cosypose

CosyPose: Consistent multi-view multi-object 6D pose estimation

ECCV 2020

crosstask

Learning Actionness via Long-range Temporal Order Verification

ECCV 2020

learntoaugment

Learning to Augment Synthetic Images for Sim2Real Policy Transfer

IROS 2019

motionforces

Estimating 3D Motion and Forces of Person-Object Interactions from Monocular Video

CVPR 2019

crosstask

Cross-task weakly supervised learning from instructional videos

CVPR 2019

d2net

D2-Net: A Trainable CNN for Joint Detection and Description of Local Features

CVPR 2019

sfnet

SFNet: Learning Object-aware Semantic Flow

CVPR 2019

obman

Learning joint reconstruction of hands and manipulated objects

CVPR 2019

ncnet

Neighbourhood Consensus Networks

NeurIPS 2018

weakactionloc

A flexible model for training action localization with varying levels of supervision

NeurIPS 2018

bodynet

BodyNet: Volumetric Inference of 3D Human Body Shapes

ECCV 2018

weakalign

End-to-end weakly-supervised semantic alignment

CVPR 2018

LTC

Long-term Temporal Convolutions for Action Recognition

PAMI 2018

objectstates

Joint Discovery of Object States and Manipulation Actions

ICCV 2017

biogans

GANs for Biological Image Synthesis

ICCV 2017

Learningvideotext

Learning from Video and Text via Large-Scale Discriminative Clustering

ICCV 2017

UNREL

Weakly-supervised learning of visual relations

ICCV 2017

SCNet

SCNet: Learning Semantic Correspondence

ICCV 2017

CNN Geometric

Convolutional Neural Network Architecture for Geometric Matching

CVPR 2017

SURREAL

Learning from Synthetic Humans

CVPR 2017

ContextLocNet

ContextLocNet: Context-aware Deep Network Models for Weakly Supervised Localization

ECCV 2016

Charades

Hollywood in Homes: Crowdsourcing Data Collection for Activity Understanding

ECCV 2016

Thin-slice

Thin-Slicing for Pose: Learning to Understand Pose without Explicit Pose Estimation

CVPR 2016

NetVLAD

NetVLAD: CNN Architecture for Weakly Supervised Place Recognition

CVPR 2016

Instance-level video segmentation

Instance-level Video Segmentation from Object Tracks

CVPR 2016

Instruction Videos

Unsupervised Learning from Narrated Instruction Videos

CVPR 2016

ProposalFLow

Proposal Flow

CVPR 2016

Discriminativev Part Detector

Learning Dictionary of Discriminative Part Detectors for Image Categorization and Cosegmentation

IJCV 2016

Unsupervised Object Discovery

Unsupervised Object Discovery and Tracking in Video Collections

ICCV 2015

Head Detection

Context-aware CNNs for Person Head Detection

ICCV 2015

P-CNN

P-CNN: Pose-based CNN Features for Action Recognition

ICCV 2015

Is object localization for free?

Is object localization for free? Weakly Supervised Object Recognition with Convolutional Neural Networks

CVPR 2015

Unsupervised Object Discovery

Unsupervised Object Discovery and Localization in the Wild

CVPR 2015

24/7 place recognition

24/7 Place Recognition by view Synthesis

CVPR 2015

SD Filter

The SD Filter: Robust Image Filtering Using Joint Static and Dynamic Guidance

PAMI 2017, CVPR 2015

Painting-to-3D

Painting-to-3D Model Alignment Via Discriminative Visual Elements

TOG 2014

CNN

Learning and Transferring Mid-Level Image Representations using Convolutional Neural Networks

CVPR 2014

Fast Video Feature

Efficient Feature Extraction, Encoding and Classification for Action Recognition

CVPR 2014

Seeing 3D Chairs

Seeing 3D chairs: Exemplar Part-based 2D-3D Alignment Using a Large Dataset of CAD Models

CVPR 2014

Graph Learning

Learning Graphs to Match

ICCV 2013

Actors and Actions

Finding Actors and Actions in Movies

ICCV 2013

Stereo Segmentation

Pose Estimation and Segmentation of People in 3D Movies

ICCV 2013

Scene Semantics

Scene Semantics from Long-term Observation of People

ECCV 2012

People Watching

People Watching: Human Actions as a Cue for Single-View Geometry

ECCV 2012

Track to the Future

Track to the Future: Spatio-temporal Video Segmentation with Long-range Motion Cues

CVPR 2011

painting alignment

Automatic Alignment of Paintings and Photographs Depicting a 3D Scene

3DRRW-ICCV 2011

data driven crowds

Data-driven Crowd Analysis in Videos

ICCV 2011

crowddensity

Density-aware person detection and tracking in crowds

ICCV 2011

deblurshaken

Non-uniform Deblurring for Shaken Images

IJCV 2012, CVPR 2010

saturation

Deblurring Shaken and Partially Saturated Images

CPCVW-ICCV 2011

confusion

Avoding confusing features in place recognition

ECCV 2010

facialattribute

Facial Attribute Classification in Video

ECCV 2010

stillactions

Human Action Classification in Still Images

BMVC 2010

mvs

Patch-based Multi-view Stereo Software

PAMI 2010, CVPR 2007

camera

What is a Camera?

CVPR 2009

fmocap

Dense 3D Motion Capture for Human Faces

CVPR 2009

internetinpainting

Internet-based Inpainting

BMVC 2009