Code and Datasets
Featured
Datasets
VDC
The first benchmark for detailed video captioning, featuring over one thousand videos with significantly longer and more detailed captions.
View moreMovieChat
A manually labeled long video QA and caption dataset, contains 1,000 video (>10K frames).
View moreVFD-2000
A video fight detection dataset collected from YouTube, contains 2,000 video clips in diverse scenarios.
View more