Dongxu Li

Cited by

	All	Since 2021
Citations	23874	23820
h-index	26	26
i10-index	30	30

13000

6500

3250

9750

202120222023202420252026152 454 2626 8099 12141 320

Public access

View all

14 articles

0 articles

available

not available

Based on funding mandates

Dongxu Li

Apple

Verified email at apple.com

Foundation Models


Title Sort by citations Sort by year Sort by title	Cited by Cited by	Year
Blip-2: Bootstrapping language-image pre-training with frozen image encoders and large language models J Li, D Li, S Savarese, S Hoi International conference on machine learning, 19730-19742, 2023	9649	2023
Blip: Bootstrapping language-image pre-training for unified vision-language understanding and generation J Li, D Li, C Xiong, S Hoi International conference on machine learning, 12888-12900, 2022	7596	2022
Instructblip: Towards general-purpose vision-language models with instruction tuning W Dai, J Li, D Li, A Tiong, J Zhao, W Wang, B Li, PN Fung, S Hoi Advances in neural information processing systems 36, 49250-49267, 2023	2280	2023
Word-level deep sign language recognition from video: A new large-scale dataset and methods comparison D Li, C Rodriguez, X Yu, H Li Proceedings of the IEEE/CVF winter conference on applications of computer …, 2020	823	2020
Blip-diffusion: Pre-trained subject representation for controllable text-to-image generation and editing D Li, J Li, S Hoi Advances in Neural Information Processing Systems 36, 30146-30166, 2023	473	2023
cosFormer: Rethinking Softmax in Attention Z Qin, W Sun, H Deng, D Li, Y Wei, B Lv, J Yan, L Kong, Y Zhong International Conference on Learning Representations, 2022	366	2022
Longvideobench: A benchmark for long-context interleaved video-language understanding H Wu, D Li, B Chen, J Li Advances in Neural Information Processing Systems 37, 28828-28857, 2024	339	2024
From images to textual prompts: Zero-shot visual question answering with frozen large language models J Guo, J Li, D Li, AMH Tiong, B Li, D Tao, S Hoi Proceedings of the IEEE/CVF conference on computer vision and pattern …, 2023	313*	2023
Align and Prompt: Video-and-Language Pre-training with Entity Prompts D Li, J Li, H Li, JC Niebles, SCH Hoi Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2021	272	2021
Tspnet: Hierarchical feature learning via temporal semantic pyramid for sign language translation D Li, C Xu, X Yu, K Zhang, B Swift, H Suominen, H Li Advances in Neural Information Processing Systems 33, 12034-12045, 2020	200	2020
Transferring cross-domain knowledge for video sign language recognition D Li, X Yu, C Xu, L Petersson, H Li Proceedings of the IEEE/CVF conference on computer vision and pattern …, 2020	185	2020
LAVIS: A One-stop Library for Language-Vision Intelligence D Li, J Li, H Le, G Wang, S Savarese, SCH Hoi Proceedings of the 61st Annual Meeting of the Association for Computational …, 2023	183*	2023
Enhanced spatio-temporal interaction learning for video deraining: A faster and better framework K Zhang, D Li, W Luo, W Ren, W Liu IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022	144	2022
Dual attention-in-attention model for joint rain streak and raindrop removal K Zhang, D Li, W Luo, W Ren IEEE Transactions on Image Processing 30, 7608-7619, 2021	127	2021
The devil in linear transformer Z Qin, X Han, W Sun, D Li, L Kong, N Barnes, Y Zhong arXiv preprint arXiv:2210.10340, 2022	120	2022
Apple intelligence foundation language models: Tech report 2025 E Li, ABL Larsen, C Zhang, X Zhou, J Qin, DA Yap, N Raghavan, X Chang, ... arXiv preprint arXiv:2507.13575, 2025	116*	2025
Aria: An open multimodal native mixture-of-experts model D Li, Y Liu, H Wu, Y Wang, Z Shen, B Qu, X Niu, G Wang, B Chen, JL Aria arXiv preprint arXiv:2410.05993 1 (3), 2024	111*	2024
Arvo: Learning all-range volumetric correspondence for video deblurring D Li, C Xu, K Zhang, X Yu, Y Zhong, W Ren, H Suominen, H Li Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2021	98	2021
`X-InstructBLIP`: A Framework for Aligning Image, 3D, Audio, Video to LLMs and its Emergent Cross-Modal Reasoning A Panagopoulou, L Xue, N Yu, J Li, D Li, S Joty, R Xu, S Savarese, ... European Conference on Computer Vision, 177-197, 2024	85*	2024
Aria-ui: Visual grounding for gui instructions Y Yang, Y Wang, D Li, Z Luo, B Chen, C Huang, J Li Findings of the Association for Computational Linguistics: ACL 2025, 22418-22433, 2025	83	2025

The system can't perform the operation now. Try again later.

Articles 1–20

Citations per year

Duplicate citations

Merged citations

Add co-authorsCo-authors

Follow

Cited by