Deyao Zhu

Cited by

	All	Since 2021
Citations	6572	6568
h-index	15	15
i10-index	18	18

3600

1800

900

2700

2023202420252026480 2403 3560 109

Co-authors

Mohamed Elhoseiny, Ph.D.Associate Professor, KAUST | Founder, Startup StealthVerified email at cs.stanford.edu
Jun ChenMeta Research Scientist | KAUST PhDVerified email at meta.com
Xiaoqian ShenCS PhD @ KAUSTVerified email at kaust.edu.sa
Xiang LiAssistant Professor, University of Bristol (Hiring PhD/Interns)Verified email at bristol.ac.uk
Kunchang LiByteDance SeedVerified email at bytedance.com
Haoqi FanFacebook AI ResearchVerified email at fb.com
Chaorui DengBytedance, chaorui.deng@bytedance.comVerified email at bytedance.com
Li Erran LiIEEE Fellow and ACM Fellow, AWS AI, AmazonVerified email at cs.columbia.edu
Abduallah MohamedApplied Research Scientist, Meta Reality LabsVerified email at fb.com
Juergen SchmidhuberKing Abdullah University of Science and Technology / The Swiss AI Lab, IDSIA / University of LuganoVerified email at kaust.edu.sa
Mingchen ZhugeKAUST AIVerified email at kaust.edu.sa
Yuhui WangPostDoc, King Abdullah University of Science and TechnologyVerified email at kaust.edu.sa
Mohamed ZahranUdacityVerified email at udacity.com

Deyao Zhu

Research Scientist, ByteDance Seed

Verified email at bytedance.com - Homepage

Reinforcement Learning Learning from Experience


Title Sort by citations Sort by year Sort by title	Cited by Cited by	Year
MiniGPT-4: Enhancing vision-language understanding with advanced large language models D Zhu, J Chen, X Shen, X Li, M Elhoseiny International Conference on Learning Representations 2024, 2023	4424	2023
MiniGPT-v2: large language model as a unified interface for vision-language multi-task learning J Chen, D Zhu, X Shen, X Li, Z Liu, P Zhang, R Krishnamoorthi, ... 2nd MMFM Workshop in CVPR2024, 2023	908	2023
Emerging Properties in Unified Multimodal Pretraining C Deng, D Zhu, K Li, C Gou, F Li, Z Wang, S Zhong, W Yu, X Nie, Z Song, ... arXiv preprint arXiv:2505.14683, 2025	259	2025
Seed1. 5-VL Technical Report D Guo, F Wu, F Zhu, F Leng, G Shi, H Chen, H Fan, J Wang, J Jiang, ... arXiv preprint arXiv:2505.07062, 2025	185*	2025
ChatGPT Asks, BLIP-2 Answers: Automatic Questioning Towards Enriched Visual Descriptions D Zhu, J Chen, K Haydarov, X Shen, W Zhang, M Elhoseiny Transactions on Machine Learning Research (TMLR), 2023	137	2023
Minigpt4-video: Advancing multimodal llms for video understanding with interleaved visual-textual tokens K Ataallah, X Shen, E Abdelrahman, E Sleiman, D Zhu, J Ding, ... 2nd MMFM Workshop in CVPR2024, 2024	134	2024
Social-Implicit: Rethinking Trajectory Prediction Evaluation and The Effectiveness of Implicit Maximum Likelihood Estimation A Mohamed, D Zhu, W Vu, M Elhoseiny, C Claudel European Conference on Computer Vision (ECCV) 2022, 2022	126	2022
Exploring Open-Vocabulary Semantic Segmentation from CLIP Vision Encoder Distillation Only J Chen, D Zhu, G Qian, B Ghanem, Z Yan, C Zhu, F Xiao, SC Culatana, ... Proceedings of the IEEE/CVF International Conference on Computer Vision, 699-710, 2023	72*	2023
MiniGPT-Med: Large Language Model as a General Interface for Radiology Diagnosis A Alkhaldi, R Alnajim, L Alabdullatef, R Alyahya, J Chen, D Zhu, A Alsinan, ... arXiv preprint arXiv:2407.04106, 2024	58	2024
Video ChatCaptioner: Towards Enriched Spatiotemporal Descriptions J Chen, D Zhu, K Haydarov, X Li, M Elhoseiny arXiv preprint arXiv:2304.04227, 2023	55	2023
Seaweed-7b: Cost-effective training of video generation foundation model T Seawead, C Yang, Z Lin, Y Zhao, S Lin, Z Ma, H Guo, H Chen, L Qi, ... arXiv preprint arXiv:2504.08685, 2025	53	2025
Goldfish: Vision-Language Understanding of Arbitrarily Long Videos K Ataallah, X Shen, E Abdelrahman, E Sleiman, M Zhuge, J Ding, D Zhu, ... European Conference on Computer Vision (ECCV) 2024, 2024	38	2024
RelTransformer: A Transformer-Based Long-Tail Visual Relationship Recognition J Chen, A Agarwal, S Abdelkarim, D Zhu, M Elhoseiny Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2022	30*	2022
Causal diffusion transformers for generative modeling C Deng, D Zhu, K Li, S Guang, H Fan arXiv preprint arXiv:2412.12095, 2024	23	2024
Motion forecasting with unlikelihood training in continuous space D Zhu, M Zahran, LE Li, M Elhoseiny Conference on Robot Learning, 1003-1012, 2022	17	2022
How Well Can Vision Language Models See Image Details? C Gou, A Felemban, FF Khan, D Zhu, J Cai, H Rezatofighi, M Elhoseiny arXiv preprint arXiv:2408.03940, 2024	15	2024
Guiding Online Reinforcement Learning with Action-Free Offline Pretraining D Zhu, Y Wang, J Schmidhuber, M Elhoseiny arXiv preprint arXiv:2301.12876, 2023	10	2023
Value Memory Graph: A Graph-Structured World Model for Offline Reinforcement Learning D Zhu, LE Li, M Elhoseiny International Conference on Learning Representations 2023, 2022	10	2022
HalentNet: Multimodal Trajectory Forecasting with Hallucinative Intents D Zhu, M Zahran, LE Li, M Elhoseiny International Conference on Learning Representations, 2021, 2021	9	2021
Learning to disentangle latent physical factors for video prediction D Zhu, M Munderloh, B Rosenhahn, J Stückler German Conference on Pattern Recognition, 595-608, 2019	5	2019

The system can't perform the operation now. Try again later.

Articles 1–20

Citations per year

Duplicate citations

Merged citations

Add co-authorsCo-authors

Follow

Cited by

Co-authors