Code and Datasets

Featured

Codebases

SAMURAI GitHub stars GitHub forks

Visual object tracking.

View more
Aurora GitHub stars GitHub forks

Efficient multimodal large language model.

View more
StableVideo GitHub stars GitHub forks

Diffusion-based video editing.

View more
MovieChat GitHub stars GitHub forks

LMM for long-form video understanding.

View more

Aurora Series

As an open source project for efficient large multimodal models, we provide training, evaluation, and deployment codebases with all the models and data. We are still working on the release of next-generation models with better performance and easy-to-use codes.
View more

Featured

Datasets

VDC

The first benchmark for detailed video captioning, featuring over one thousand videos with significantly longer and more detailed captions.

View more
MovieChat

A manually labeled long video QA and caption dataset, contains 1,000 video, for each longer than ten thousands frames.

View more
VFD-2000

A video fight detection dataset collected from YouTube, contains 2,000 video clips in diverse scenarios.

View more

Video Benchmark and Challenge

We are not stop building comprehensive benchmarks for video understanding. We are hosting the Video Understanding Challenge every year in CVPR, including video detailed captioning, long-form video understanding, and other video tasks.
View more

Featured

Surveys

Deep vision multimodal learning: Methodology, benchmark, and trend

Applied Science

View more
Deep Learning Methods for Small Molecule Drug Discovery: A Survey

IEEE Transactions on Artificial Intelligence

View more
A Survey of Deep Learning in Sports Applications: Perception, Comprehension, and Decision

arXiv Preprint.

View more
Awesome-list: Vector Quantized Variational Autoencoder (VQ-VAE)

GitHub Repo GitHub stars GitHub forks

View more