Years
Sort by
* Equal contribution.   Project lead.   Corresponding author.

2025

Conference

  1. PromptHaze: Prompting Real-world Dehazing via Depth Anything Model
    Tian Ye, Sixiang Chen, Haoyu Chen, Wenhao Chai, Jingjing Ren, Zhaohu Xing, Wenxue Li, Lei Zhu
    Association for the Advancement of Artificial Intelligence (AAAI), 2025

  2. AGLLDiff: Guiding Diffusion Models Towards Unsupervised Training-free Real-world Low-light Image Enhancement
    Yunlong Lin, Tian Ye, Sixiang Chen, Zhenqi Fu, Yingying Wang, Wenhao Chai, Zhaohu Xing, Lei Zhu, Xinghao Ding
    Association for the Advancement of Artificial Intelligence (AAAI), 2025
    [Paper] [Project Page]

  3. Exploring Learning-based Motion Models in Multi-Object Tracking
    Hsiang-Wei Huang, Cheng-Yen Yang, Wenhao Chai, Zhongyu Jiang, Jenq-Neng Hwang
    International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2025.
    [Paper]

2024

Conference

  1. MovieChat: From Dense Token to Sparse Memory in Long Video Understanding
    Enxin Song*, Wenhao Chai*, Guanhong Wang*, Yucheng Zhang, Haoyang Zhou, Feiyang Wu, Haozhe Chi, Xun Guo, Tian Ye, Yanting Zhang, Yan Lu, Jenq-Neng Hwang, Gaoang Wang
    Computer Vision and Pattern Recognition (CVPR), 2024
    [Project Page] [Paper] [Blog] [Video] [Dataset] [Leaderboard] [Code]

  2. Learning Diffusion Texture Priors for Image Restoration
    Tian Ye, Sixiang Chen, Wenhao Chai, Zhaohu Xing, Jing Qin, Ge Lin, Lei Zhu
    Computer Vision and Pattern Recognition (CVPR) Highlight, 2024
    [Paper]

  3. See and Think: Embodied Agent in Virtual Environment
    Zhonghan Zhao*, Wenhao Chai*, Xuan Wang*, Boyi Li, Shengyu Hao, Shidong Cao, Tian Ye, Jenq-Neng Hwang, Gaoang Wang
    European Conference on Computer Vision (ECCV), 2024
    [Project Page] [Paper] [Dataset] [Code]

  4. RT-Pose: A 4D Radar Tensor-based 3D Human Pose Estimation and Localization Benchmark
    Yuan-Hao Ho, Jen-Hao Cheng, Sheng Yao Kuan, Zhongyu Jiang, Wenhao Chai, Hsiang-Wei Huang, Jenq-Neng Hwang, Chih-Lung Lin
    European Conference on Computer Vision (ECCV), 2024
    [Paper] [Dataset] [Code]

  5. UniAP: Towards Universal Animal Perception in Vision via Few-shot Learning
    Meiqi Sun*, Zhonghan Zhao*, Wenhao Chai*, Hanjun Luo, Shidong Cao, Yanting Zhang, Jenq-Neng Hwang, Gaoang Wang
    Association for the Advancement of Artificial Intelligence (AAAI), 2024
    [Project Page] [Paper] [Code]

  6. Ego3DT: Tracking Every 3D Object in Ego-centric Videos
    Shengyu Hao*, Wenhao Chai*, Zhonghan Zhao*, Meiqi Sun, Wendi Hu, Jieyang Zhou, Yixian Zhao, Qi Li, Yizhou Wang, Xi Li, Gaoang Wang
    ACM International Conference on Multimedia (ACM MM), 2024
    [Paper]

  7. LLaVA-Ultra: Large Chinese Language and Vision Assistant for Ultrasound
    Xuechen Guo, Wenhao Chai, Shi-Yan Li, Gaoang Wang
    ACM International Conference on Multimedia (ACM MM), 2024
    [Paper]

  8. Blind Inpainting with Object-aware Discrimination for Artificial Marker Removal
    Xuechen Guo, Wenhao Hu, Chiming Ni, Wenhao Chai, Shiyan Li, Gaoang Wang
    International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
    [Paper]

  9. Back to Optimization: Diffusion-based Zero-Shot 3D Human Pose Estimation
    Zhongyu Jiang, Zhuoran Zhou, Lei Li, Wenhao Chai, Cheng-Yen Yang, Jenq-Neng Hwang
    Winter Conference on Applications of Computer Vision (WACV), 2024
    [Project Page] [Paper] [Code]

  10. MPM: A Unified 2D-3D Human Pose Representation via Masked Pose Modeling
    Zhenyu Zhang, Wenhao Chai, Zhongyu Jiang, Tian Ye, Mingli Song, Jenq-Neng Hwang, Gaoang Wang
    Chinese Conference on Pattern Recognition and Computer Vision (PRCV), 2024
    [Paper] [Code]

  11. Chasing Consistency in Text-to-3D Generation from a Single Image
    Yichen Ouyang, Wenhao Chai, Jiayi Ye, Dapeng Tao, Yibing Zhan, Gaoang Wang
    ACM International Conference on Multimedia in Asia (ACM MM Asia), 2024
    [Paper]

  12. Boosting Online 3D Multi-Object Tracking through Camera-Radar Cross Check
    Sheng-Yao Kuan, Jen-Hao Cheng, Hsiang-Wei Huang, Wenhao Chai, Cheng-Yen Yang, Hugo Latapie, Gaowen Liu, Bing-Fei Wu, Jenq-Neng Hwang
    Intelligent Vehicles Symposium (IV), 2024
    [Paper]

Journal

  1. Random bridge generator as a platform for developing computer vision-based structural inspection algorithms
    Haojia Cheng*, Wenhao Chai*, Jiabao Hu*, Wenhao Ruan*, Mingyu Shi, Hyunjun Kim, Yifan Cao, Yasutaka Narazaki
    Journal of Infrastructure Intelligence and Resilience
    [Paper]

  2. Unsupervised Domain Adaptation Approach for Vision-based Semantic Understanding of Bridge Inspection Scenes without Manual Annotations
    Yasutaka Narazaki, Wendong Pang, Gaoang Wang, Wenhao Chai
    ACSE Journal of Bridge Engineering
    [Paper]

Workshop and Technical Report

  1. STEVE Series: Step-by-Step Construction of Agent Systems in Minecraft
    Zhonghan Zhao*, Wenhao Chai*, Xuan Wang, Ke Ma, Kewei Chen, Dongxu Guo, Tian Ye, Yanting Zhang, Hongwei Wang, Gaoang Wang
    Computer Vision and Pattern Recognition (CVPR) Workshop @ Embodied AI, 2024
    [Project Page] [Paper] [Dataset] [Code]

  2. NTIRE 2024 Image Shadow Removal Challenge Report
    Computer Vision and Pattern Recognition (CVPR) Workshop @ NTIRE, 2024
    [Report] [Challenge Page]

  3. Hierarchical Auto-Organizing System for Open-Ended Multi-Agent Navigation
    Zhonghan Zhao*, Kewei Chen*, Dongxu Guo*, Wenhao Chai, Tian Ye, Yanting Zhang, Gaoang Wang
    International Conference on Learning Representations (ICLR) Workshop @ LLM Agents, 2024
    [Paper]

  4. Efficient Domain Adaptation via Generative Prior for 3D Infant Pose Estimation
    Zhuoran Zhou, Zhongyu Jiang, Wenhao Chai, Cheng-Yen Yang, Lei Li, Jenq-Neng Hwang
    Winter Conference on Applications of Computer Vision (WACV) Workshop @ Computer Vision with Small Data, 2024
    [Project Page] [Paper] [Code]

Preprint

  1. AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark
    Wenhao Chai*, Enxin Song*, Yilun Du, Chenlin Meng, Vashisht Madhavan, Omer Bar-Tal, Jenq-Neng Hwang, Saining Xie, Christopher D. Manning
    arXiv preprint
    [Project Page] [Paper] [Video] [Model] [Benchmark] [Leaderboard] [Dataset] [Code]

  2. MovieChat+: Question-aware Sparse Memory for Long Video Question Answering
    Enxin Song*, Wenhao Chai*, Tian Ye, Jenq-Neng Hwang, Xi Li, Gaoang Wang
    arXiv Preprint.
    [Project Page] [Paper] [Blog] [Video] [Dataset] [Leaderboard] [Code]

  3. SAMURAI: Adapting Segment Anything Model for Zero-Shot Visual Tracking with Motion-Aware Memory
    Cheng-Yen Yang, Hsiang-Wei Huang, Wenhao Chai, Zhongyu Jiang, Jenq-Neng Hwang
    arXiv Preprint.
    [Project Page] [Paper] [Video] [Raw Result] [Code]

  4. PAD: Personalized Alignment at Decoding-Time
    Ruizhe Chen*, Xiaotian Zhang*, Meng Luo*, Wenhao Chai*, Zuozhu Liu
    arXiv Preprint.
    [Paper]

  5. CityCraft: A Real Crafter for 3D City Generation
    Jie Deng*, Wenhao Chai*, Junsheng Huang*, Zhonghan Zhao*, Qixuan Huang, Mingyan Gao, Jianshu Guo, Shengyu Hao, Wenhao Hu, Jenq-Neng Hwang, Xi Li, Gaoang Wang
    arXiv Preprint.
    [Paper] [Code]

  6. VersaT2I: Improving Text-to-Image Models with Versatile Reward
    Jianshu Guo*, Wenhao Chai*, Jie Deng*, Hsiang-Wei Huang, Tian Ye, Yichen Xu, Jiawei Zhang, Jenq-Neng Hwang, Gaoang Wang
    arXiv Preprint.
    [Paper]

  7. Do We Really Need a Complex Agent System? Distill Embodied Agent into a Single Model
    Zhonghan Zhao, Ke Ma, Wenhao Chai, Xuan Wang, Kewei Chen, Dongxu Guo, Yanting Zhang, Hongwei Wang, Gaoang Wang
    arXiv Preprint.
    [Paper]

  8. MonoTAKD: Teaching Assistant Knowledge Distillation for Monocular 3D Object Detection
    Hou-I Liu, Christine Wu, Jen-Hao Cheng, Wenhao Chai, Shian-Yun Wang, Gaowen Liu, Jenq-Neng Hwang, Hong-Han Shuai, Wen-Huang Cheng
    arXiv Preprint.
    [Paper]

2023

Conference

  1. StableVideo: Text-driven Consistency-aware Diffusion Video Editing
    Wenhao Chai, Xun Guo, Gaoang Wang, Yan Lu
    International Conference on Computer Vision (ICCV), 2023
    [Project Page] [Paper] [Video] [Demo] [Code]

  2. Global Adaptation meets Local Generalization: Unsupervised Domain Adaptation for 3D Human Pose Estimation
    Wenhao Chai, Zhongyu Jiang, Jenq-Neng Hwang, Gaoang Wang
    International Conference on Computer Vision (ICCV), 2023
    [Paper] [Code]

  3. Sequential Affinity Learning for Video Restoration
    Tian Ye, Sixiang Chen, Yun Liu, Wenhao Chai, Jinbin Bai, Wenbin Zou, Yunchen Zhang, Jiang Mingchao, Erkang Chen, Chenghao Xue
    ACM International Conference on Multimedia (ACM MM), 2023
    [Paper] [Code]

  4. PoSynDA: Multi-Hypothesis Pose Synthesis Domain Adaptation for Enhanced 3D Human Pose Estimation
    Hanbing Liu, Jun-Yan He, Zhi-Qi Cheng, Wangmeng Xiang, Qize Yang, Wenhao Chai, Gaoang Wang, Xu Bao, Bin Luo, Yifeng Geng, Xuansong Xie
    ACM International Conference on Multimedia (ACM MM), 2023
    [Paper] [Code]

  5. Five A+ Network: You Only Need 9K Parameters for Underwater Image Enhancement
    Jingxia Jiang, Tian Ye, Jinbin Bai, Sixiang Chen, Wenhao Chai, Jun Shi, Yun Liu, Erkang Chen
    British Machine Vision Conference (BMVC), 2023
    [Paper] [Code]

  6. User-Aware Prefix-Tuning is a Good Learner for Personalized Image Captioning
    Xuan Wang, Guanhong Wang, Wenhao Chai, Jiayu Zhou, Gaoang Wang
    Chinese Conference on Pattern Recognition and Computer Vision (PRCV), 2023

Journal

  1. DiffFashion: Reference-based Fashion Design with Structure-aware Transfer by Diffusion Models
    Shidong Cao*, Wenhao Chai*, Shengyu Hao, Yanting Zhang, Hangyue Chen, Gaoang Wang
    IEEE Transactions on Multimedia (TMM)
    [Paper] [Code]

  2. Deep Learning Methods for Small Molecule Drug Discovery: A Survey
    Wenhao Hu*, Yingying Liu*, Xuanyu Chen, Wenhao Chai, Hangyue Chen, Hongwei Wang, Gaoang Wang
    IEEE Transactions on Artificial Intelligence (TAI)
    [Paper]

Workshop and Technical Report

  1. Image Reference-guided Fashion Design with Structure-aware Transfer by Diffusion Models
    Shidong Cao*, Wenhao Chai*, Shengyu Hao, Gaoang Wang
    Computer Vision and Pattern Recognition (CVPR) Workshop @ Computer Vision for Fashion, Art, and Design, 2023
    [Paper] [Code]

  2. Devil in the Number: Towards Robust Multi-modality Data Filter
    Yichen Xu, Zihan Xu, Wenhao Chai, Zhonghan Zhao, Enxin Song, Gaoang Wang
    International Conference on Computer Vision (ICCV) Workshop @ DataComp, 2023
    [Paper]

Preprint

  1. CityGen: Infinite and Controllable 3D City Layout Generation
    Jie Deng*, Wenhao Chai*, Jianshu Guo*, Qixuan Huang, Wenhao Hu, Jenq-Neng Hwang, Gaoang Wang
    arXiv Preprint.
    [Project Page] [Paper] [Code]

  2. UniHPE: Towards Unified Human Pose Estimation via Contrastive Learning
    Zhongyu Jiang, Wenhao Chai, Lei Li, Zhuoran Zhou, Cheng-Yen Yang, Jenq-Neng Hwang
    arXiv Preprint.
    [Paper]

  3. A Survey of Deep Learning in Sports Applications: Perception, Comprehension, and Decision
    Zhonghan Zhao*, Wenhao Chai*, Shengyu Hao, Wenhao Hu, Guanhong Wang, Shidong Cao, Gaoang Wang, Mingli Song, Jenq-Neng Hwang
    arXiv Preprint.
    [Paper]

2022

Conference

  1. Automatic Spinal Ultrasound Image Segmentation and Deployment for Real-time Spine Volumetric Reconstruction
    Yifan Cao*, Chenghao Tan*, Wenzhuo Qian, Wenhao Chai, Luhang Cui, Wenxuan Yang, Xinben Hu, Yongjian Zhu, Wenhui Zhou, Xingfa Shen
    International Conference on Unmanned Systems, Best Paper Award, 2022
    [Paper] [Code]

  2. Weakly Supervised Two-Stage Training Scheme for Deep Video Fight Detection Model
    Zhenting Qi*, Ruike Zhu*, Zheyu Fu*, Wenhao Chai*, Volodymyr Kindratenko
    International Conference on Tools with Artificial Intelligence, 2022
    [Paper] [Dataset]

Journal

  1. Deep Vision Multimodal Learning: Methodology, Benchmark, and Trend
    Wenhao Chai, Gaoang Wang
    Applied Sciences
    [Paper]