Diffusion-based video editing.
LLM-based long video understanding.
A video fight detection dataset collected from YouTube, contains 2,000 video clips in diverse scenarios.
A manually labeled long video QA and caption dataset, contains 1,000 video (>10K frames).
Applied Science
IEEE Transactions on Artificial Intelligence
arXiv Preprint
GitHub Repo