What is HowTo100M ?

HowTo100M is a large-scale dataset of narrated videos with an emphasis on instructional videos where content creators teach complex tasks with an explicit intention of explaining the visual content on screen. HowTo100M features a total of:

Each video is associated with a narration available as subtitles automatically downloaded from Youtube.

Real-Time Natural Language search on HowTo100M

We have implemented an online Text-to-Video retrieval demo that performs search and localization in videos using a simple Text-Video model trained on HowTo100M. The demo runs on a single CPU machine and implements FAISS approximate nearest neighbour search implementation.
Please note that to make the search through hundreds of millions of video clips run in real time, this demo uses a lighter (and less accurate) version of the model than the one described in the paper.
Query examples: Check voltage, Cut paper, Cut salmon, Measure window length, Animal dance ....

Dataset statistics



If you wish to download the video files from our private server, please fill this form: here. We are hosting the videos rescaled so that min(heigh, width) = 256 and we have removed the audio.


   title={How{T}o100{M}: {L}earning a {T}ext-{V}ideo {E}mbedding by {W}atching {H}undred {M}illion {N}arrated {V}ideo {C}lips},
   author={Miech, Antoine and Zhukov, Dimitri and Alayrac, Jean-Baptiste and Tapaswi, Makarand and Laptev, Ivan and Sivic, Josef},


We note that the distribution of identities and activities in the HowTo100M dataset may not be representative of the global human population and the diversity in society. Please be careful of unintended societal, gender, racial and other biases when training or deploying models trained on this data.