Overview
VTC contains 7920 samples, each consisting of a video-text instruction pair and a compliance/non-compliance label. The dataset has over 1.2 million frames. We take a unique approach in data collection so that the dataset can be automatically augmented from a set of core videos. To answer growing concerns on data privacy, we carefully followed privacy preserving safe-guards in the generation of VTC dataset.
Dataset Metadata
Format | License | Domain | Number of Records | Size |
---|---|---|---|---|
MP4 CSV |
CDLA-Sharing | Video Classification | 7920 video samples 1.2 million frames |
2GB |
Example Records
carry_bag_P1000344_iter006.mp4 0 open_predetermined_suitcase_calmly
carry_bag_P1000344_iter007.mp4 0 precisely_place_the_appropriate_box
carry_bag_P1000344_iter005.mp4 0 push_accessible_cart
carry_bag_P1000344_iter004.mp4 0 open_the_applicable_bag_at_once
carry_bag_P1000344_iter000.mp4 0 carry_the_specified_box
Citation
@InProceedings{Jaiswal_2019_ICCV_Workshops,
author = {Jaiswal, Mayoore and Liu, Frank and Jagannathan, Anupama and Gattiker, Anne and Hwang, Inseok and Lee, Jinho and Tong, Matthew and Dureja, Sahil and Shah, Soham and Hofstee, Peter and Chen, Valerie and Paul, Suvadip and Feris, Rogerio},
title = {Video-Text Compliance: Activity Verification Based on Natural Language Instructions},
booktitle = {The IEEE International Conference on Computer Vision (ICCV) Workshops},
month = {Oct},
year = {2019}
}
Related Links
- Video-Text Compliance: Activity Verification Based on Natural Language Instructions (Paper) The Video-Text Compliance (VTC) dataset contains videos of atomic activities, along with text instructions and compliance labels. The VTC dataset is constructed by an autoaugmentation technique, preserves privacy, and contains over 1.2 million frames.
Legend