Botian Shi

Cited by

	All	Since 2021
Citations	6977	6928
h-index	33	33
i10-index	60	60

4700

2350

1175

3525

202020212022202320242025202642 84 192 433 1385 4683 145

Public access

View all

25 articles

6 articles

available

not available

Based on funding mandates

Co-authors

Pinlong CaiShanghai Artificial Intelligence LaboratoryVerified email at pjlab.org.cn
Licheng WenShanghai AI LaboratoryVerified email at pjlab.org.cn
Xuemeng YangShanghai AI LaboratoryVerified email at pjlab.org.cn
Yu QiaoProfessor of Shanghai AI Laboratory; Shenzhen Institutes of Advanced Technology, CASVerified email at siat.ac.cn
Nan DuanVice President of JD.Com (now) | StepFun | Microsoft ResearchVerified email at microsoft.com
Huaishao LuoJD AI ResearchVerified email at jd.com
Ming Zhou (周明)Chief Scientist at Sinovation, ACL president (2019), VP of CCF(2020-2024)Verified email at chuangxin.com
Pan LuStanford UniversityVerified email at stanford.edu
Yaobo Liangmicrosoft.comVerified email at microsoft.com
Zhongyuan WangBAAIVerified email at baai.ac.cn
Yujing WangBytedanceVerified email at bytedance.com
Graham NeubigCarnegie Mellon UniversityVerified email at cs.cmu.edu
Junyi DuUniversity of Southern CaliforniaVerified email at usc.edu
Fangzheng (Frank) XuMicrosoft AIVerified email at microsoft.com
Rong-Cheng TuNanyang Technological UniversityVerified email at ntu.edu.sg

Botian Shi

Shanghai Artificial Intelligence Laboratory, Shanghai Innovation Institution (SII)

Verified email at pjlab.org.cn

VLMs Autonomous Driving Multi-agent Systems


Title Sort by citations Sort by year Sort by title	Cited by Cited by	Year
Expanding performance boundaries of open-source multimodal models with model, data, and test-time scaling Z Chen, W Wang, Y Cao, Y Liu, Z Gao, E Cui, J Zhu, S Ye, H Tian, Z Liu, ... arXiv preprint arXiv:2412.05271, 2024	1009	2024
How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites Z Chen, W Wang, H Tian, S Ye, Z Gao, E Cui, W Tong, K Hu, J Luo, Z Ma, ... arXiv preprint arXiv:2404.16821, 2024	995	2024
Internvl3: Exploring advanced training and test-time recipes for open-source multimodal models J Zhu, W Wang, Z Chen, Z Liu, S Ye, L Gu, H Tian, Y Duan, W Su, J Shao, ... arXiv preprint arXiv:2504.10479, 2025	670	2025
Univl: A unified video and language pre-training model for multimodal understanding and generation H Luo, L Ji, B Shi, H Huang, N Duan, T Li, J Li, T Bharti, M Zhou arXiv preprint arXiv:2002.06353, 2020	586	2020
Drive like a human: Rethinking autonomous driving with large language models D Fu, X Li, L Wen, M Dou, P Cai, B Shi, Y Qiao 2024 IEEE/CVF Winter Conference on Applications of Computer Vision Workshops …, 2024	300	2024
DiLu: A Knowledge-Driven Approach to Autonomous Driving with Large Language Models L Wen, D Fu, X Li, X Cai, T Ma, P Cai, M Dou, B Shi, L He, Y Qiao The Twelfth International Conference on Learning Representations (ICLR), 2024	295	2024
Multi-modal sensor fusion for auto driving perception: A survey K Huang, B Shi, X Li, X Li, S Huang, Y Li arXiv preprint arXiv:2202.02703, 2022	261	2022
Internvl3. 5: Advancing open-source multimodal models in versatility, reasoning, and efficiency W Wang, Z Gao, L Gu, H Pu, L Cui, X Wei, Z Liu, L Jing, S Ye, J Shao, ... arXiv preprint arXiv:2508.18265, 2025	232	2025
Logonet: Towards accurate 3d object detection with local-to-global cross-modal fusion X Li, T Ma, Y Hou, B Shi, Y Yang, Y Liu, X Wu, Q Chen, Y Li, Y Qiao, L He Proceedings of the IEEE/CVF conference on computer vision and pattern …, 2023	216	2023
Multi-sensor fusion and cooperative perception for autonomous driving: A review C Xiang, C Feng, X Xie, B Shi, H Lu, Y Lv, M Yang, Z Niu IEEE Intelligent Transportation Systems Magazine 15 (5), 36-58, 2023	175	2023
Mineru: An open-source solution for precise document content extraction B Wang, C Xu, X Zhao, L Ouyang, F Wu, Z Zhao, R Xu, K Liu, Y Qu, ... arXiv preprint arXiv:2409.18839, 2024	126	2024
Streetsurf: Extending multi-view implicit surface reconstruction to street views J Guo, N Deng, X Li, Y Bai, B Shi, C Wang, C Ding, D Wang, Y Li arXiv preprint arXiv:2306.04988, 2023	121	2023
Chartx & chartvlm: A versatile benchmark and foundation model for complicated chart reasoning R Xia, H Ye, X Yan, Q Liu, H Zhou, Z Chen, B Shi, J Yan, B Zhang IEEE Transactions on Image Processing, 2025	116	2025
On the road with gpt-4v (ision): Early explorations of visual-language model on autonomous driving L Wen, X Yang, D Fu, X Wang, P Cai, X Li, T Ma, Y Li, L Xu, D Shang, ... arXiv preprint arXiv:2311.05332, 2023	107	2023
Is sora a world simulator? a comprehensive survey on general world models and beyond Z Zhu, X Wang, W Zhao, C Min, B Li, N Deng, M Dou, Y Wang, B Shi, ... arXiv preprint arXiv:2405.03520, 2024	99	2024
Knowledge Aware Semantic Concept Expansion for Image-Text Matching. B Shi, L Ji, P Lu, Z Niu, N Duan Proceedings of the Twenty-Eighth International Joint Conference on …, 2019	95	2019
Homogeneous Multi-modal Feature Fusion and Interaction for 3D Object Detection X Li, B Shi, Y Hou, X Wu, T Ma, Y Li, L He Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel …, 2022	91	2022
Dense procedure captioning in narrated instructional videos B Shi, L Ji, Y Liang, N Duan, P Chen, Z Niu, M Zhou Proceedings of the 57th annual meeting of the association for computational …, 2019	90	2019
Uni3d: A unified baseline for multi-dataset 3d object detection B Zhang, J Yuan, B Shi, T Chen, Y Li, Y Qiao Proceedings of the IEEE/CVF conference on computer vision and pattern …, 2023	75	2023
Microsoft concept graph: Mining semantic concepts for short text understanding L Ji, Y Wang, B Shi, D Zhang, Z Wang, J Yan Data Intelligence 1 (3), 238-270, 2019	73	2019

The system can't perform the operation now. Try again later.

Articles 1–20

Citations per year

Duplicate citations

Merged citations

Add co-authorsCo-authors

Follow

Cited by

Co-authors