Code and Datasets

Featured

Codebases

Aurora GitHub stars GitHub forks

Efficient multimodal large language model.

View more
StableVideo GitHub stars GitHub forks

Diffusion-based video editing.

View more
MovieChat GitHub stars GitHub forks

LLM-based long video understanding.

View more

Featured

Datasets

VDC

The first benchmark for detailed video captioning, featuring over one thousand videos with significantly longer and more detailed captions.

View more
MovieChat

A manually labeled long video QA and caption dataset, contains 1,000 video (>10K frames).

View more
VFD-2000

A video fight detection dataset collected from YouTube, contains 2,000 video clips in diverse scenarios.

View more

Featured

Surveys

Deep vision multimodal learning: Methodology, benchmark, and trend

 

Applied Science

View more
Deep Learning Methods for Small Molecule Drug Discovery: A Survey

 

IEEE Transactions on Artificial Intelligence

View more
A Survey of Deep Learning in Sports Applications: Perception, Comprehension, and Decision

arXiv Preprint

View more
Awesome-list: Vector Quantized Variational Autoencoder (VQ-VAE)

GitHub Repo GitHub stars GitHub forks

View more