What is HowTo100M ?

HowTo100M is a large-scale dataset of narrated videos with an emphasis on instructional videos where content creators teach complex tasks with an explicit intention of explaining the visual content on screen. HowTo100M features a total of:

Each video is associated with a narration available as subtitles automatically downloaded from Youtube.

Real-Time Natural Language search on HowTo100M

We have implemented an online Text-to-Video retrieval demo that performs search and localization in videos using a simple Text-Video model trained on HowTo100M. The demo runs on a single CPU machine and implements FAISS approximate nearest neighbour search implementation.
Please note that to make the search through hundreds of millions of video clips run in real time, this demo uses a lighter (and less accurate) version of the model than the one described in the paper.
Query examples: Check voltage, Cut paper, Cut salmon, Measure window length, Animal dance ....

Dataset statistics




   title={How{T}o100{M}: {L}earning a {T}ext-{V}ideo {E}mbedding by {W}atching {H}undred {M}illion {N}arrated {V}ideo {C}lips},
   author={Miech, Antoine and Zhukov, Dimitri and Alayrac, Jean-Baptiste and Tapaswi, Makarand and Laptev, Ivan and Sivic, Josef},