About

Profile

Canyu Zhao

Hello, I'm a 3rd-year PHD Student at Zhejiang University, specializing in VLM, DIffusion Model and 3D Display.

Research Interests

Large Vision Language Model

Advancing the reasoning capabilities of Multimodal Large Language Models.

VLM/LLM RL Post-training Agent Think with Image

Diffusion Model

Developing controllable generative models for high-fidelity image and video generation and editing.

Image Generation Video Generation Editing Customization

3D Display

Glasses-free 3D display technologies and real-time rendering techniques.

3D Display

Featured Publications

Diception: A generalist diffusion model for visual perceptual tasks
Conference

Canyu Zhao, Yanlong Sun, Mingyu Liu, Huanyi Zheng, Muzhi Zhu, Zhiyue Zhao, Hao Chen, Tong He, Chunhua Shen

NeurIPS SpotLight Award

2025

Glasses-free 3D display with ultrawide viewing range using deep learning
Journal

Weijie Ma, Zhangrui Zhao, Canyu Zhao, Wanli Ouyang, Han-Sen Zhong

Nature

2025

Moviedreamer: Hierarchical generation for coherent long visual sequence
Conference

Canyu Zhao*, Mingyu Liu*, Wen Wang*, Weihua Chen, Fan Wang, Hao Chen, Bo Zhang, Chunhua Shen

ICLR

2025

Autostory: Generating diverse storytelling images with minimal human efforts
Journal

Wen Wang*, Canyu Zhao*, Hao Chen, Zhekai Chen, Kecheng Zheng, Chunhua Shen

International Journal of Computer Vision

2024

Freecustom: Tuning-free customized image generation for multi-concept composition
Conference

Ganggui Ding*, Canyu Zhao*, Wen Wang*, Zhen Yang, Zide Liu, Hao Chen, Chunhua Shen

CVPR

2024

Omni-R1: Reinforcement Learning for Omnimodal Reasoning via Two-System Collaboration
Conference

Hao Zhong*, Muzhi Zhu*, Zongze Du, Zheng Huang, Canyu Zhao, Mingyu Liu, Wen Wang, Hao Chen, Chunhua Shen

NeurIPS

2025

FreerCustom: Training-Free Multi-Concept Customization for Image and Video Generation
Journal

Canyu Zhao, Ganggui Ding, Wen Wang, Zhen Yang, Zide Liu, Hao Chen & Chunhua Shen

International Journal of Computer Vision

2025

Active-O3: Empowering Multimodal Large Language Models with Active Perception via GRPO
Arxiv

Muzhi Zhu*, Hao Zhong*, Canyu Zhao, Zongze Du, Zheng Huang, Mingyu Liu, Hao Chen, Cheng Zou, Jingdong Chen, Ming Yang, Chunhua Shen

Arxiv

2025

Tinker: Diffusion's Gift to 3D--Multi-View Consistent Editing From Sparse Inputs without Per-Scene Optimization
Arxiv

Canyu Zhao*, Xiaoman Li*, Tianjian Feng, Zhiyue Zhao, Hao Chen, Chunhua Shen

Arxiv

2025

StaMo: Unsupervised Learning of Generalizable Robot Motion from Compact State Representation
Arxiv

Mingyu Liu, Jiuhe Shu, Hui Chen, Zeju Li, Canyu Zhao, Jiange Yang, Shenyuan Gao, Hao Chen, Chunhua Shen

Arxiv

2025

NoTVLA: Narrowing of Dense Action Trajectories for Generalizable Robot Manipulation
Arxiv

Zheng Huang, Mingyu Liu, Xiaoyi Lin, Muzhi Zhu, Canyu Zhao, Zongze Du, Xiaoman Li, Yiduo Jia, Hao Zhong, Hao Chen, Chunhua Shen

Arxiv

2025

Bridge Thinking and Acting: Unleashing Physical Potential of VLM with Generalizable Action Expert
Arxiv

Mingyu Liu, Zheng Huang, Xiaoyi Lin, Muzhi Zhu, Canyu Zhao, Zongze Du, Yating Wang, Haoyi Zhu, Hao Chen, Chunhua Shen

Arxiv

2025

Featured Projects

DreamFace: A Commercial AI Platform for Image and Video Generation
current

2025 - Present

Dreamface, Create Effortless AI Videos & Photos. Create avatar video, AI video, and AI photo with a single click. Participated in Video Generation Model development.

Recent Notes