* Equal contribution. † Project lead. ‡ Corresponding author.
2025
Conference
-
PromptHaze: Prompting Real-world Dehazing via Depth Anything Model
Tian Ye, Sixiang Chen, Haoyu Chen, Wenhao Chai, Jingjing Ren, Zhaohu Xing, Wenxue Li, Lei Zhu‡
Association for the Advancement of Artificial Intelligence (AAAI), 2025
-
AGLLDiff: Guiding Diffusion Models Towards Unsupervised Training-free Real-world Low-light Image Enhancement
Yunlong Lin, Tian Ye, Sixiang Chen, Zhenqi Fu, Yingying Wang, Wenhao Chai, Zhaohu Xing, Lei Zhu, Xinghao Ding‡
Association for the Advancement of Artificial Intelligence (AAAI), 2025
[Paper]
[Project Page]
-
Exploring Learning-based Motion Models in Multi-Object Tracking
Hsiang-Wei Huang, Cheng-Yen Yang, Wenhao Chai, Zhongyu Jiang, Jenq-Neng Hwang‡
International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2025.
[Paper]
2024
Conference
-
MovieChat: From Dense Token to Sparse Memory in Long Video Understanding
Enxin Song*, Wenhao Chai*†, Guanhong Wang*, Yucheng Zhang, Haoyang Zhou, Feiyang Wu, Haozhe Chi, Xun Guo, Tian Ye, Yanting Zhang, Yan Lu, Jenq-Neng Hwang, Gaoang Wang‡
Computer Vision and Pattern Recognition (CVPR), 2024
[Project Page]
[Paper]
[Blog]
[Video]
[Dataset]
[Leaderboard]
[Code]
-
Learning Diffusion Texture Priors for Image Restoration
Tian Ye, Sixiang Chen, Wenhao Chai, Zhaohu Xing, Jing Qin, Ge Lin, Lei Zhu‡
Computer Vision and Pattern Recognition (CVPR) Highlight, 2024
[Paper]
-
See and Think: Embodied Agent in Virtual Environment
Zhonghan Zhao*, Wenhao Chai*†, Xuan Wang*, Boyi Li, Shengyu Hao, Shidong Cao, Tian Ye, Jenq-Neng Hwang, Gaoang Wang‡
European Conference on Computer Vision (ECCV), 2024
[Project Page]
[Paper]
[Dataset]
[Code]
-
RT-Pose: A 4D Radar Tensor-based 3D Human Pose Estimation and Localization Benchmark
Yuan-Hao Ho, Jen-Hao Cheng, Sheng Yao Kuan, Zhongyu Jiang, Wenhao Chai, Hsiang-Wei Huang, Jenq-Neng Hwang, Chih-Lung Lin‡
European Conference on Computer Vision (ECCV), 2024
[Paper]
[Dataset]
[Code]
-
UniAP: Towards Universal Animal Perception in Vision via Few-shot Learning
Meiqi Sun*, Zhonghan Zhao*, Wenhao Chai*, Hanjun Luo, Shidong Cao, Yanting Zhang, Jenq-Neng Hwang, Gaoang Wang‡
Association for the Advancement of Artificial Intelligence (AAAI), 2024
[Project Page]
[Paper]
[Code]
-
Ego3DT: Tracking Every 3D Object in Ego-centric Videos
Shengyu Hao*, Wenhao Chai*, Zhonghan Zhao*, Meiqi Sun, Wendi Hu, Jieyang Zhou, Yixian Zhao, Qi Li, Yizhou Wang, Xi Li, Gaoang Wang‡
ACM International Conference on Multimedia (ACM MM), 2024
[Paper]
-
LLaVA-Ultra: Large Chinese Language and Vision Assistant for Ultrasound
Xuechen Guo, Wenhao Chai, Shi-Yan Li, Gaoang Wang‡
ACM International Conference on Multimedia (ACM MM), 2024
[Paper]
-
Blind Inpainting with Object-aware Discrimination for Artificial Marker Removal
Xuechen Guo, Wenhao Hu, Chiming Ni, Wenhao Chai, Shiyan Li, Gaoang Wang‡
International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
[Paper]
-
Back to Optimization: Diffusion-based Zero-Shot 3D Human Pose Estimation
Zhongyu Jiang, Zhuoran Zhou, Lei Li, Wenhao Chai, Cheng-Yen Yang, Jenq-Neng Hwang‡
Winter Conference on Applications of Computer Vision (WACV), 2024
[Project Page]
[Paper]
[Code]
-
MPM: A Unified 2D-3D Human Pose Representation via Masked Pose Modeling
Zhenyu Zhang, Wenhao Chai, Zhongyu Jiang, Tian Ye, Mingli Song, Jenq-Neng Hwang, Gaoang Wang‡
Chinese Conference on Pattern Recognition and Computer Vision (PRCV), 2024
[Paper]
[Code]
-
Chasing Consistency in Text-to-3D Generation from a Single Image
Yichen Ouyang, Wenhao Chai, Jiayi Ye, Dapeng Tao, Yibing Zhan, Gaoang Wang‡
ACM International Conference on Multimedia in Asia (ACM MM Asia), 2024
[Paper]
-
Boosting Online 3D Multi-Object Tracking through Camera-Radar Cross Check
Sheng-Yao Kuan, Jen-Hao Cheng, Hsiang-Wei Huang, Wenhao Chai, Cheng-Yen Yang, Hugo Latapie, Gaowen Liu, Bing-Fei Wu, Jenq-Neng Hwang‡
Intelligent Vehicles Symposium (IV), 2024
[Paper]
Journal
-
Random bridge generator as a platform for developing computer vision-based structural inspection algorithms
Haojia Cheng*, Wenhao Chai*, Jiabao Hu*, Wenhao Ruan*, Mingyu Shi, Hyunjun Kim, Yifan Cao, Yasutaka Narazaki‡
Journal of Infrastructure Intelligence and Resilience
[Paper]
-
Unsupervised Domain Adaptation Approach for Vision-based Semantic Understanding of Bridge Inspection Scenes without Manual Annotations
Yasutaka Narazaki‡, Wendong Pang, Gaoang Wang, Wenhao Chai
ACSE Journal of Bridge Engineering
[Paper]
Workshop and Technical Report
-
STEVE Series: Step-by-Step Construction of Agent Systems in Minecraft
Zhonghan Zhao*, Wenhao Chai*†, Xuan Wang, Ke Ma, Kewei Chen, Dongxu Guo, Tian Ye, Yanting Zhang, Hongwei Wang, Gaoang Wang‡
Computer Vision and Pattern Recognition (CVPR) Workshop @ Embodied AI, 2024
[Project Page]
[Paper]
[Dataset]
[Code]
-
NTIRE 2024 Image Shadow Removal Challenge Report
Computer Vision and Pattern Recognition (CVPR) Workshop @ NTIRE, 2024
[Report]
[Challenge Page]
-
Hierarchical Auto-Organizing System for Open-Ended Multi-Agent Navigation
Zhonghan Zhao*, Kewei Chen*, Dongxu Guo*, Wenhao Chai†, Tian Ye, Yanting Zhang, Gaoang Wang‡
International Conference on Learning Representations (ICLR) Workshop @ LLM Agents, 2024
[Paper]
-
Efficient Domain Adaptation via Generative Prior for 3D Infant Pose Estimation
Zhuoran Zhou, Zhongyu Jiang, Wenhao Chai, Cheng-Yen Yang, Lei Li‡, Jenq-Neng Hwang
Winter Conference on Applications of Computer Vision (WACV) Workshop @ Computer Vision with Small Data, 2024
[Project Page]
[Paper]
[Code]
Preprint
-
AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark
Wenhao Chai*†, Enxin Song*, Yilun Du, Chenlin Meng, Vashisht Madhavan, Omer Bar-Tal, Jenq-Neng Hwang, Saining Xie, Christopher D. Manning
arXiv preprint
[Project Page]
[Paper]
[Video]
[Model]
[Benchmark]
[Leaderboard]
[Dataset]
[Code]
-
MovieChat+: Question-aware Sparse Memory for Long Video Question Answering
Enxin Song*, Wenhao Chai*†, Tian Ye, Jenq-Neng Hwang, Xi Li, Gaoang Wang‡
arXiv Preprint.
[Project Page]
[Paper]
[Blog]
[Video]
[Dataset]
[Leaderboard]
[Code]
-
SAMURAI: Adapting Segment Anything Model for Zero-Shot Visual Tracking with Motion-Aware Memory
Cheng-Yen Yang, Hsiang-Wei Huang, Wenhao Chai, Zhongyu Jiang, Jenq-Neng Hwang‡
arXiv Preprint.
[Project Page]
[Paper]
[Video]
[Raw Result]
[Code]
-
PAD: Personalized Alignment at Decoding-Time
Ruizhe Chen*, Xiaotian Zhang*, Meng Luo*, Wenhao Chai*, Zuozhu Liu‡
arXiv Preprint.
[Paper]
-
CityCraft: A Real Crafter for 3D City Generation
Jie Deng*, Wenhao Chai*, Junsheng Huang*, Zhonghan Zhao*, Qixuan Huang, Mingyan Gao, Jianshu Guo, Shengyu Hao, Wenhao Hu, Jenq-Neng Hwang, Xi Li, Gaoang Wang‡
arXiv Preprint.
[Paper]
[Code]
-
VersaT2I: Improving Text-to-Image Models with Versatile Reward
Jianshu Guo*, Wenhao Chai*†, Jie Deng*, Hsiang-Wei Huang, Tian Ye, Yichen Xu, Jiawei Zhang, Jenq-Neng Hwang, Gaoang Wang‡
arXiv Preprint.
[Paper]
-
Do We Really Need a Complex Agent System? Distill Embodied Agent into a Single Model
Zhonghan Zhao, Ke Ma, Wenhao Chai†, Xuan Wang, Kewei Chen, Dongxu Guo, Yanting Zhang, Hongwei Wang, Gaoang Wang‡
arXiv Preprint.
[Paper]
-
MonoTAKD: Teaching Assistant Knowledge Distillation for Monocular 3D Object Detection
Hou-I Liu, Christine Wu, Jen-Hao Cheng, Wenhao Chai, Shian-Yun Wang, Gaowen Liu, Jenq-Neng Hwang, Hong-Han Shuai, Wen-Huang Cheng‡
arXiv Preprint.
[Paper]
2023
Conference
-
StableVideo: Text-driven Consistency-aware Diffusion Video Editing
Wenhao Chai, Xun Guo‡, Gaoang Wang, Yan Lu
International Conference on Computer Vision (ICCV), 2023
[Project Page]
[Paper]
[Video]
[Demo]
[Code]
-
Global Adaptation meets Local Generalization: Unsupervised Domain Adaptation for 3D Human Pose Estimation
Wenhao Chai, Zhongyu Jiang, Jenq-Neng Hwang, Gaoang Wang‡
International Conference on Computer Vision (ICCV), 2023
[Paper]
[Code]
-
Sequential Affinity Learning for Video Restoration
Tian Ye, Sixiang Chen, Yun Liu, Wenhao Chai, Jinbin Bai, Wenbin Zou, Yunchen Zhang, Jiang Mingchao, Erkang Chen‡, Chenghao Xue
ACM International Conference on Multimedia (ACM MM), 2023
[Paper]
[Code]
-
PoSynDA: Multi-Hypothesis Pose Synthesis Domain Adaptation for Enhanced 3D Human Pose Estimation
Hanbing Liu, Jun-Yan He, Zhi-Qi Cheng, Wangmeng Xiang, Qize Yang, Wenhao Chai, Gaoang Wang, Xu Bao, Bin Luo, Yifeng Geng, Xuansong Xie‡
ACM International Conference on Multimedia (ACM MM), 2023
[Paper]
[Code]
-
Five A+ Network: You Only Need 9K Parameters for Underwater Image Enhancement
Jingxia Jiang, Tian Ye, Jinbin Bai, Sixiang Chen, Wenhao Chai, Jun Shi, Yun Liu, Erkang Chen‡
British Machine Vision Conference (BMVC), 2023
[Paper]
[Code]
-
User-Aware Prefix-Tuning is a Good Learner for Personalized Image Captioning
Xuan Wang, Guanhong Wang, Wenhao Chai, Jiayu Zhou, Gaoang Wang‡
Chinese Conference on Pattern Recognition and Computer Vision (PRCV), 2023
Journal
-
DiffFashion: Reference-based Fashion Design with Structure-aware Transfer by Diffusion Models
Shidong Cao*, Wenhao Chai*, Shengyu Hao, Yanting Zhang, Hangyue Chen, Gaoang Wang‡
IEEE Transactions on Multimedia (TMM)
[Paper]
[Code]
-
Deep Learning Methods for Small Molecule Drug Discovery: A Survey
Wenhao Hu*, Yingying Liu*, Xuanyu Chen, Wenhao Chai, Hangyue Chen, Hongwei Wang, Gaoang Wang‡
IEEE Transactions on Artificial Intelligence (TAI)
[Paper]
Workshop and Technical Report
-
Image Reference-guided Fashion Design with Structure-aware Transfer by Diffusion Models
Shidong Cao*, Wenhao Chai*, Shengyu Hao, Gaoang Wang‡
Computer Vision and Pattern Recognition (CVPR) Workshop @ Computer Vision for Fashion, Art, and Design, 2023
[Paper]
[Code]
-
Devil in the Number: Towards Robust Multi-modality Data Filter
Yichen Xu, Zihan Xu, Wenhao Chai†, Zhonghan Zhao, Enxin Song, Gaoang Wang‡
International Conference on Computer Vision (ICCV) Workshop @ DataComp, 2023
[Paper]
Preprint
-
CityGen: Infinite and Controllable 3D City Layout Generation
Jie Deng*, Wenhao Chai*†, Jianshu Guo*, Qixuan Huang, Wenhao Hu, Jenq-Neng Hwang, Gaoang Wang‡
arXiv Preprint.
[Project Page]
[Paper]
[Code]
-
UniHPE: Towards Unified Human Pose Estimation via Contrastive Learning
Zhongyu Jiang, Wenhao Chai, Lei Li, Zhuoran Zhou, Cheng-Yen Yang, Jenq-Neng Hwang‡
arXiv Preprint.
[Paper]
-
A Survey of Deep Learning in Sports Applications: Perception, Comprehension, and Decision
Zhonghan Zhao*, Wenhao Chai*, Shengyu Hao, Wenhao Hu, Guanhong Wang, Shidong Cao, Gaoang Wang‡, Mingli Song, Jenq-Neng Hwang
arXiv Preprint.
[Paper]
2022
Conference
-
Automatic Spinal Ultrasound Image Segmentation and Deployment for Real-time Spine Volumetric Reconstruction
Yifan Cao*, Chenghao Tan*, Wenzhuo Qian, Wenhao Chai, Luhang Cui, Wenxuan Yang, Xinben Hu, Yongjian Zhu, Wenhui Zhou‡, Xingfa Shen
International Conference on Unmanned Systems, Best Paper Award, 2022
[Paper]
[Code]
-
Weakly Supervised Two-Stage Training Scheme for Deep Video Fight Detection Model
Zhenting Qi*, Ruike Zhu*, Zheyu Fu*, Wenhao Chai*, Volodymyr Kindratenko‡
International Conference on Tools with Artificial Intelligence, 2022
[Paper]
[Dataset]
Journal
-
Deep Vision Multimodal Learning: Methodology, Benchmark, and Trend
Wenhao Chai, Gaoang Wang‡
Applied Sciences
[Paper]