[go: up one dir, main page]

Follow
Dongxu Li
Dongxu Li
Verified email at apple.com
Title
Cited by
Cited by
Year
Blip-2: Bootstrapping language-image pre-training with frozen image encoders and large language models
J Li, D Li, S Savarese, S Hoi
International conference on machine learning, 19730-19742, 2023
96492023
Blip: Bootstrapping language-image pre-training for unified vision-language understanding and generation
J Li, D Li, C Xiong, S Hoi
International conference on machine learning, 12888-12900, 2022
75962022
Instructblip: Towards general-purpose vision-language models with instruction tuning
W Dai, J Li, D Li, A Tiong, J Zhao, W Wang, B Li, PN Fung, S Hoi
Advances in neural information processing systems 36, 49250-49267, 2023
22802023
Word-level deep sign language recognition from video: A new large-scale dataset and methods comparison
D Li, C Rodriguez, X Yu, H Li
Proceedings of the IEEE/CVF winter conference on applications of computer …, 2020
8232020
Blip-diffusion: Pre-trained subject representation for controllable text-to-image generation and editing
D Li, J Li, S Hoi
Advances in Neural Information Processing Systems 36, 30146-30166, 2023
4732023
cosFormer: Rethinking Softmax in Attention
Z Qin, W Sun, H Deng, D Li, Y Wei, B Lv, J Yan, L Kong, Y Zhong
International Conference on Learning Representations, 2022
3662022
Longvideobench: A benchmark for long-context interleaved video-language understanding
H Wu, D Li, B Chen, J Li
Advances in Neural Information Processing Systems 37, 28828-28857, 2024
3392024
From images to textual prompts: Zero-shot visual question answering with frozen large language models
J Guo, J Li, D Li, AMH Tiong, B Li, D Tao, S Hoi
Proceedings of the IEEE/CVF conference on computer vision and pattern …, 2023
313*2023
Align and Prompt: Video-and-Language Pre-training with Entity Prompts
D Li, J Li, H Li, JC Niebles, SCH Hoi
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2021
2722021
Tspnet: Hierarchical feature learning via temporal semantic pyramid for sign language translation
D Li, C Xu, X Yu, K Zhang, B Swift, H Suominen, H Li
Advances in Neural Information Processing Systems 33, 12034-12045, 2020
2002020
Transferring cross-domain knowledge for video sign language recognition
D Li, X Yu, C Xu, L Petersson, H Li
Proceedings of the IEEE/CVF conference on computer vision and pattern …, 2020
1852020
LAVIS: A One-stop Library for Language-Vision Intelligence
D Li, J Li, H Le, G Wang, S Savarese, SCH Hoi
Proceedings of the 61st Annual Meeting of the Association for Computational …, 2023
183*2023
Enhanced spatio-temporal interaction learning for video deraining: A faster and better framework
K Zhang, D Li, W Luo, W Ren, W Liu
IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022
1442022
Dual attention-in-attention model for joint rain streak and raindrop removal
K Zhang, D Li, W Luo, W Ren
IEEE Transactions on Image Processing 30, 7608-7619, 2021
1272021
The devil in linear transformer
Z Qin, X Han, W Sun, D Li, L Kong, N Barnes, Y Zhong
arXiv preprint arXiv:2210.10340, 2022
1202022
Apple intelligence foundation language models: Tech report 2025
E Li, ABL Larsen, C Zhang, X Zhou, J Qin, DA Yap, N Raghavan, X Chang, ...
arXiv preprint arXiv:2507.13575, 2025
116*2025
Aria: An open multimodal native mixture-of-experts model
D Li, Y Liu, H Wu, Y Wang, Z Shen, B Qu, X Niu, G Wang, B Chen, JL Aria
arXiv preprint arXiv:2410.05993 1 (3), 2024
111*2024
Arvo: Learning all-range volumetric correspondence for video deblurring
D Li, C Xu, K Zhang, X Yu, Y Zhong, W Ren, H Suominen, H Li
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2021
982021
X-InstructBLIP: A Framework for Aligning Image, 3D, Audio, Video to LLMs and its Emergent Cross-Modal Reasoning
A Panagopoulou, L Xue, N Yu, J Li, D Li, S Joty, R Xu, S Savarese, ...
European Conference on Computer Vision, 177-197, 2024
85*2024
Aria-ui: Visual grounding for gui instructions
Y Yang, Y Wang, D Li, Z Luo, B Chen, C Huang, J Li
Findings of the Association for Computational Linguistics: ACL 2025, 22418-22433, 2025
832025
The system can't perform the operation now. Try again later.
Articles 1–20