Fine-grained Incident Video Retrieval (FIVR) – 200K Dataset

  • home
  • /
  • Fine-grained Incident Video Retrieval (FIVR) – 200K Dataset

The dataset has been collected to simulate the problem of Fine-grained Incident Video Retrieval (FIVR). It comprises 225,960 videos associated with 4,687 Wikipedia events and 100 selected video queries. A detailed description of the dataset is provided at the ‘FIVR: Fine-grained Incident Video Retrieval‘ paper and the FIVR-200K dataset website.

The dataset composition comprises of three steps:

  • Crawling of Wikipedia’s Current Event page to build a collection of the major news events since the beginning of 2013. The time interval used for crawling the news events was from January 1st 2013 to December 31st 2017. Only news events categorized as “Armed conflicts and attacks” or “Disasters and accidents” were retained. The public YouTube API was then used to collect videos by providing event headlines as queries. 
  • Automatic selection of the query videos. A retrieval pipeline was deployed that estimated the suitability of candidate videos as benchmarks. A video graph is generated based on the similarity between videos that derives from the visual similarity of the video content and the textual similarity of the video titles. Then, the connected components of the video graph are extracted and filtered based on empirical rules. 
  • Manual annotation was applied on the collected queries. Since it would be overly time-consuming to annotate all queries, the top 100 of them was selected as the final query set for manual annotation. For the annotation of each query video, a three-step process was followed by the annotators, and the database videos were annotated according to five labels that derive based on the definitions of the associated videos.  

In the dataset, three association types of related videos are considered:

  • Duplicate Scene Videos (DSV): Videos that share at least one scene (captured by the same camera) regardless of any applied transformation.
  • Complementary Scene Videos (CSV): Videos that contain part of the same spatio-temporal segment, but captured from different viewpoints.
  • Incident Scene Videos (ISV): Videos that capture the same incident, i.e. they are spatially and temporally close, but have no overlap.

Example videos:


The dataset is publicly available and licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0).

Code to download the dataset can be found in the dataset’s GitHub repo.