Video-Text Compliance


VTC contains 7920 samples, each consisting of a video-text instruction pair and a compliance/non-compliance label. The dataset has over 1.2 million frames. We take a unique approach in data collection so that the dataset can be automatically augmented from a set of core videos. To answer growing concerns on data privacy, we carefully followed privacy preserving safe-guards in the generation of VTC dataset.

Dataset Metadata

Format License Domain Number of Records Size
CDLA-Sharing Video Classification 7920 video samples
1.2 million frames

Example Records

carry_bag_P1000344_iter006.mp4 0 open_predetermined_suitcase_calmly    
carry_bag_P1000344_iter007.mp4 0 precisely_place_the_appropriate_box    
carry_bag_P1000344_iter005.mp4 0 push_accessible_cart    
carry_bag_P1000344_iter004.mp4 0 open_the_applicable_bag_at_once    
carry_bag_P1000344_iter000.mp4 0 carry_the_specified_box


