[go: up one dir, main page]

Follow
Deyao Zhu
Deyao Zhu
Research Scientist, ByteDance Seed
Verified email at bytedance.com - Homepage
Title
Cited by
Cited by
Year
MiniGPT-4: Enhancing vision-language understanding with advanced large language models
D Zhu, J Chen, X Shen, X Li, M Elhoseiny
International Conference on Learning Representations 2024, 2023
44242023
MiniGPT-v2: large language model as a unified interface for vision-language multi-task learning
J Chen, D Zhu, X Shen, X Li, Z Liu, P Zhang, R Krishnamoorthi, ...
2nd MMFM Workshop in CVPR2024, 2023
9082023
Emerging Properties in Unified Multimodal Pretraining
C Deng, D Zhu, K Li, C Gou, F Li, Z Wang, S Zhong, W Yu, X Nie, Z Song, ...
arXiv preprint arXiv:2505.14683, 2025
2592025
Seed1. 5-VL Technical Report
D Guo, F Wu, F Zhu, F Leng, G Shi, H Chen, H Fan, J Wang, J Jiang, ...
arXiv preprint arXiv:2505.07062, 2025
185*2025
ChatGPT Asks, BLIP-2 Answers: Automatic Questioning Towards Enriched Visual Descriptions
D Zhu, J Chen, K Haydarov, X Shen, W Zhang, M Elhoseiny
Transactions on Machine Learning Research (TMLR), 2023
1372023
Minigpt4-video: Advancing multimodal llms for video understanding with interleaved visual-textual tokens
K Ataallah, X Shen, E Abdelrahman, E Sleiman, D Zhu, J Ding, ...
2nd MMFM Workshop in CVPR2024, 2024
1342024
Social-Implicit: Rethinking Trajectory Prediction Evaluation and The Effectiveness of Implicit Maximum Likelihood Estimation
A Mohamed, D Zhu, W Vu, M Elhoseiny, C Claudel
European Conference on Computer Vision (ECCV) 2022, 2022
1262022
Exploring Open-Vocabulary Semantic Segmentation from CLIP Vision Encoder Distillation Only
J Chen, D Zhu, G Qian, B Ghanem, Z Yan, C Zhu, F Xiao, SC Culatana, ...
Proceedings of the IEEE/CVF International Conference on Computer Vision, 699-710, 2023
72*2023
MiniGPT-Med: Large Language Model as a General Interface for Radiology Diagnosis
A Alkhaldi, R Alnajim, L Alabdullatef, R Alyahya, J Chen, D Zhu, A Alsinan, ...
arXiv preprint arXiv:2407.04106, 2024
582024
Video ChatCaptioner: Towards Enriched Spatiotemporal Descriptions
J Chen, D Zhu, K Haydarov, X Li, M Elhoseiny
arXiv preprint arXiv:2304.04227, 2023
552023
Seaweed-7b: Cost-effective training of video generation foundation model
T Seawead, C Yang, Z Lin, Y Zhao, S Lin, Z Ma, H Guo, H Chen, L Qi, ...
arXiv preprint arXiv:2504.08685, 2025
532025
Goldfish: Vision-Language Understanding of Arbitrarily Long Videos
K Ataallah, X Shen, E Abdelrahman, E Sleiman, M Zhuge, J Ding, D Zhu, ...
European Conference on Computer Vision (ECCV) 2024, 2024
382024
RelTransformer: A Transformer-Based Long-Tail Visual Relationship Recognition
J Chen, A Agarwal, S Abdelkarim, D Zhu, M Elhoseiny
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2022
30*2022
Causal diffusion transformers for generative modeling
C Deng, D Zhu, K Li, S Guang, H Fan
arXiv preprint arXiv:2412.12095, 2024
232024
Motion forecasting with unlikelihood training in continuous space
D Zhu, M Zahran, LE Li, M Elhoseiny
Conference on Robot Learning, 1003-1012, 2022
172022
How Well Can Vision Language Models See Image Details?
C Gou, A Felemban, FF Khan, D Zhu, J Cai, H Rezatofighi, M Elhoseiny
arXiv preprint arXiv:2408.03940, 2024
152024
Guiding Online Reinforcement Learning with Action-Free Offline Pretraining
D Zhu, Y Wang, J Schmidhuber, M Elhoseiny
arXiv preprint arXiv:2301.12876, 2023
102023
Value Memory Graph: A Graph-Structured World Model for Offline Reinforcement Learning
D Zhu, LE Li, M Elhoseiny
International Conference on Learning Representations 2023, 2022
102022
HalentNet: Multimodal Trajectory Forecasting with Hallucinative Intents
D Zhu, M Zahran, LE Li, M Elhoseiny
International Conference on Learning Representations, 2021, 2021
92021
Learning to disentangle latent physical factors for video prediction
D Zhu, M Munderloh, B Rosenhahn, J Stückler
German Conference on Pattern Recognition, 595-608, 2019
52019
The system can't perform the operation now. Try again later.
Articles 1–20