Zhuofan Zong

Cited by

	All	Since 2021
Citations	1787	1783
h-index	14	14
i10-index	15	15

1100

550

275

825

20212022202320242025202613 18 112 551 1041 45

Public access

View all

9 articles

0 articles

available

not available

Based on funding mandates

Co-authors

Guanglu Songvivix.aiVerified email at vivix.ai
Hao ShaoCUHK, MMLabVerified email at link.cuhk.edu.hk
Hongsheng Li (李鸿升)The Chinese University of Hong KongVerified email at ee.cuhk.edu.hk
Dongzhi JiangMMLab, The Chinese University of Hong KongVerified email at link.cuhk.edu.hk
Zeyue Xue (薛泽岳)The University of Hong Kong; MMLAB@HKUVerified email at connect.hku.hk
Bingqi Mavivix.aiVerified email at vivix.ai
Ping Luo (羅平)Associate Professor, The University of Hong Kong; MMLAB@HKUVerified email at hku.hk
Kunchang LiByteDance SeedVerified email at bytedance.com
Yu QiaoProfessor of Shanghai AI Laboratory; Shenzhen Institutes of Advanced Technology, CASVerified email at siat.ac.cn
Dazhong ShenNanjing University of Aeronautics and Astronautics

Zhuofan Zong

MMLab, The Chinese University of Hong Kong

Verified email at link.cuhk.edu.hk - Homepage

Large Models Multimodal Object Detection 3D Object Detection


Title Sort by citations Sort by year Sort by title	Cited by Cited by	Year
Detrs with collaborative hybrid assignments training Z Zong, G Song, Y Liu Proceedings of the IEEE/CVF International Conference on Computer Vision …, 2023	722	2023
Visual cot: Advancing multi-modal language models with a comprehensive dataset and benchmark for chain-of-thought reasoning H Shao, S Qian, H Xiao, G Song, Z Zong, L Wang, Y Liu, H Li Advances in Neural Information Processing Systems 37, 8612-8642, 2024	223	2024
Raphael: Text-to-image generation via large mixture of diffusion paths Z Xue, G Song, Q Guo, B Liu, Z Zong, Y Liu, P Luo Advances in Neural Information Processing Systems 36, 41693-41706, 2023	215	2023
Mova: Adapting mixture of vision experts to multimodal context Z Zong, B Ma, D Shen, G Song, H Shao, D Jiang, H Li, Y Liu Advances in Neural Information Processing Systems 37, 103305-103333, 2024	96	2024
Visual cot: Unleashing chain-of-thought reasoning in multi-modal language models H Shao, S Qian, H Xiao, G Song, Z Zong, L Wang, Y Liu, H Li CoRR, 2024	86	2024
T2i-r1: Reinforcing image generation with collaborative semantic-level and token-level cot D Jiang, Z Guo, R Zhang, Z Zong, H Li, L Zhuo, S Yan, PA Heng, H Li arXiv preprint arXiv:2505.00703, 2025	82	2025
Graph attention based proposal 3d convnets for action detection J Li, X Liu, Z Zong, W Zhao, M Zhang, J Song Proceedings of the AAAI Conference on Artificial Intelligence 34 (04), 4626-4633, 2020	67	2020
Comat: Aligning text-to-image diffusion model with image-to-text concept matching D Jiang, G Song, X Wu, R Zhang, D Shen, Z Zong, Y Liu, H Li Advances in Neural Information Processing Systems 37, 76177-76209, 2024	55	2024
Self-slimmed vision transformer Z Zong, K Li, G Song, Y Wang, Y Qiao, B Leng, Y Liu European Conference on Computer Vision, 432-448, 2022	53	2022
Temporal enhanced training of multi-view 3d object detector via historical object prediction Z Zong, D Jiang, G Song, Z Xue, J Su, H Li, Y Liu Proceedings of the IEEE/CVF International Conference on Computer Vision …, 2023	49	2023
Exploring the role of large language models in prompt encoding for diffusion models B Ma, Z Zong, G Song, H Li, Y Liu Advances in Neural Information Processing Systems 37, 118428-118455, 2024	40	2024
RCNet: Reverse feature pyramid and cross-scale shift network for object detection Z Zong, Q Cao, B Leng Proceedings of the 29th ACM International Conference on Multimedia, 5637-5645, 2021	26	2021
Jingyong Su, Hongsheng Li, and Yu Liu. Temporal enhanced training of multi-view 3d object detector via historical object prediction Z Zong, D Jiang, G Song, Z Xue Proceedings of the IEEE/CVF International Conference on Computer Vision …, 2023	25	2023
Easyref: Omni-generalized group image reference for diffusion models via multimodal llm Z Zong, D Jiang, B Ma, G Song, H Shao, D Shen, Y Liu, H Li Forty-second International Conference on Machine Learning, 2024	14	2024
DETRs with collaborative hybrid assignments training (2023) Z Zong, G Song, Y Liu arXiv preprint arXiv:2211.12860, 0	11
Large-batch optimization for dense visual predictions: Training faster R-CNN in 4.2 minutes Z Xue, J Liang, G Song, Z Zong, L Chen, Y Liu, P Luo Advances in Neural Information Processing Systems 35, 18694-18706, 2022	7	2022
VividFace: A Diffusion-Based Hybrid Framework for High-Fidelity Video Face Swapping H Shao, S Wang, Y Zhou, G Song, D He, S Qin, Z Zong, B Ma, Y Liu, H Li arXiv preprint arXiv:2412.11279, 2024	5	2024
Large-batch optimization for dense visual predictions Z Xue, J Liang, G Song, Z Zong, L Chen, Y Liu, P Luo Advances in Neural Information Processing Systems 1, 2022	5	2022
ADT: Tuning Diffusion Models with Adversarial Supervision D Shen, G Song, Y Zhang, B Ma, L Li, D Jiang, Z Zong, Y Liu arXiv preprint arXiv:2504.11423, 2025	3	2025
Webgen-agent: Enhancing interactive website generation with multi-level feedback and step-level reinforcement learning Z Lu, H Ren, Y Yang, K Wang, Z Zong, J Pan, M Zhan, H Li arXiv preprint arXiv:2509.22644, 2025	2	2025

The system can't perform the operation now. Try again later.

Articles 1–20

Citations per year

Duplicate citations

Merged citations

Add co-authorsCo-authors

Follow

Cited by

Co-authors