Wenhao Chai

Graduate Student @
University of Washington

About

Wenhao Chai

Wenhao Chai is currently a graduate student at University of Washington, with Information Processing Lab advised by Prof. Jenq-Neng Hwang. Previously, he was an undergradate student at Zhejiang University, with CVNext Lab advised by Prof. Gaoang Wang. He is fortunate to have internship at Multi-modal Computing Group, Microsoft Research Asia.

His research primarily in video understanding, generative models, embodied agent, as well as human pose and motion. Have a look at the overview of our research. All publications are listed here and in Google Scholar.

We are always looking for collaborators who have the same interests like us. Follow me on Twitter.

StableVideo

07/2023: We release StableVideo, a diffusion-based framework for text-driven video editing, which is accepted to ICCV 2023. The project repo has gained over 1.3k stars at GitHub.

View more

MovieChat

07/2023: We release MovieChat, the first framework that can chat with over ten thousands frames of video, accepted to CVPR 2024. We also host LOVEU: LOng-form VidEo Understanding challenge in CVPR 2024!

View more

STEVE

12/2023: We release STEVE series, named after the protagonist of the game Minecraft, aims to build an embodied agent based on the vision model and LLMs within an open world.

View more

CityGen

12/2023: We release CityGen, a novel framework for infinite, controllable and diverse 3D city layout generation.

View more

Check Out

News and Highlights

  • 04/2024: We are hosting CVPR 2024 Long-form Video Understanding Challenge @ LOVEU.
  • 04/2024: Invited talk at AgentX seminar about our STEVE series works.
  • 03/2024: One paper accepted to ICLR 2024 workshop @ LLM Agents.
  • 02/2024: Two paper accepted to CVPR 2024 (1 highlight).
  • 02/2024: Invited talk at AAAI 2024 workshop @ IMAGEOMICS.
  • 01/2024: We are working with Pika Lab to develop next-generation video understanding and generation models.
  • 12/2023: One paper accepted to ICASSP 2024.
  • 12/2023: One paper accepted to AAAI 2024.
  • 11/2023: Two paper accepted to WACV 2024 workshop @ CV4Smalls.
  • 09/2023: One paper accepted to ICCV 2023 workshop @ TNGCV-DataComp.
  • 09/2023: One paper accepted to IEEE T-MM.
  • 08/2023: One paper accepted to BMVC 2023.
  • 07/2023: Two paper accepted to ACM MM 2023.
  • 07/2023: Finish my research internship at Microsoft Research Asia (MSRA), Beijing.
  • 07/2023: Two paper accepted to ICCV 2023.

View more

Join Us

Welcome Collaboration

We are finding potential collaborators in both Pika Lab, ZJU and UW in terms of video Understanding and generative models (video, image, or 3D). If you are interested in our research, please feel free to contact.



  • Email address: wchai@uw.edu

Recent

Projects

* Equal contribution. Project lead. Corresponding author.


StableVideo: Text-driven Consistency-aware Diffusion Video Editing
Wenhao Chai, Xun Guo, Gaoang Wang, Yan Lu
International Conference on Computer Vision (ICCV), 2023
[Website] [Paper] [Video] [Demo] [Code]

We tackle introduce temporal dependency to existing text-driven diffusion models, which allows them to generate consistent appearance for the new objects.

MovieChat: From Dense Token to Sparse Memory in Long Video Understanding
Enxin Song*, Wenhao Chai*, Guanhong Wang*, Yucheng Zhang, Haoyang Zhou, Feiyang Wu, Haozhe Chi, Xun Guo, Tian Ye, Yanting Zhang, Yan Lu, Jenq-Neng Hwang, Gaoang Wang
Computer Vision and Pattern Recognition (CVPR), 2024
[Website] [Paper] [Blog] [Dataset] [Code] NPM

MovieChat achieves state-of-the-art performace in extra long video (more than 10K frames) understanding by introducing memory mechanism.

Explore

More